Text Mining Islamic Law


Digital humanities has a venerable pedigree, stretching back to the middle of the twentieth century, but despite noteworthy pioneering contributions it has not become a mainstream practice in Islamic Studies. This essay applies humanities computing to the study of Islamic law. We analyze a representative corpus of works of Islamic substantive law (furūʿ al-fiqh) from the beginnings of Islamic legal jurisprudence to the early modern period (2nd/8th-13th/19th c.) using several computational tools and methods: text-reuse network analysis based on plain-text annotations and html tags, clustered frequency-based analysis, word clouds, and topic modeling. Applying machine-guided distant reading to Islamic legal texts over the longue-dureé, we study (1) the role of the Qurʾān, (2) patterns of normative qualifications (aḥkām), and (3) the distribution of topics in our corpus. In certain instances the analysis confirms claims made in the scholarly literature on Islamic law, in other instances it corrects such claims.

to the study of Islamic law. We analyze a representative corpus of works of Islamic substantive law (furūʿ al-fiqh) from the beginnings of Islamic legal jurisprudence to the early modern period (2nd/8th-13th/19th c.) using several computational tools and methods: text-reuse network analysis based on plain-text annotations and html tags, clustered frequency-based analysis, word clouds, and topic modeling. Applying machine-guided distant reading to Islamic legal texts over the longue-duree, we study (1) the role of the Qurʾān, (2) patterns of normative qualifications (aḥkām), and (3) the distribution of topics in our corpus. In certain instances the analysis confirms claims made in the scholarly literature on Islamic law, in other instances it corrects such claims.

Keywords
Islamic law -digital humanities -schools of law in Islam -Qurʾān

Digital Humanities and Islamic Law
Digital humanities, the practice of conducting or facilitating humanities research by using technological and computational means, has a venerable pedigree, stretching back to the middle of the twentieth century.1 Since the turn of the millennium, this practice has undergone a tangible acceleration.2 As regards Islamic Studies and its sister disciplines, including Arabic, Persian, Ottoman and Middle Eastern Studies, increasing numbers of Arabic (and to a lesser extent, also Persian and Ottoman) texts are not just scanned and made available as Pdfs, but also converted into full-text, sometimes in annotated digital format, which makes them searchable and amenable to computational analysis. In addition, the expanding digital infrastructure in which these texts are hosted3 and the concomitant increase in websites and software allow researchers to sift through this growing corpus of texts with ever-increasing ease.4 For more than a decade, scholars have harvested these digital archives and made use of their various functionalities. It is no exaggeration to say, however, that digital humanities-if understood as not just combing the digital archive but also as tool-building and machineguided analysis5-has only begun to have an impact in Islamic Studies. This is not to deny that there have been noteworthy pioneering efforts,6 and developments in the last decade suggest that scholars may in fact be witnessing a new dawn of digital research in Islamic Studies and related disciplines. A series of recent roundtables and conferences, as well as the appearance, in 2016, of the first edited volume dedicated to Islamicate digital humanities (henceforth: idh), indicate that the field is consolidating.7 It should be noted, however, that most efforts in idh have been directed first and foremost at creating digital corpora and indexing them with metadata, and only secondly to machinedriven analysis. To cite an example, Harvard Law School's SHARIASource, a project of obvious relevance to the computational study of Islamic law, so far has been functioning as a platform and repository first and foremost; the same may be said about other, equally impressive initiatives, such as the two projects based in Germany, Corpus Coranicum and Bibliotheca Arabica.8 While interest in developing novel computational methods and analytical tools for idh is palpable in a number of current collaborative research projects,9 there is at present no sustained output of publications, let alone a journal or book series dedicated to idh. Studies that foreground data-based computational analysis in Islamic Studies are still a rare phenomenon in traditional publication venues.
Although idh may be moving in a promising direction, the field is still in its infancy. This situation carries a certain risk as a growing number of researchers, especially those of a younger generation, follow the siren call of idh and invest time and energy in acquiring coding skills and contributing to the building of a digital infrastructure. However, up to the present, idh cannot be said to have produced results that would seem to justify such an investment. There is, to this day, no substantial body of achievement in idh. One may therefore be forgiven for wondering whether the perception of idh as the vanguard of a scholarly revolution is accurate. Perhaps a recalibration of our expectations is in order?
Truth be told, the digital analysis of textual corpora, in Arabic and in other languages, tends to underwhelm, by simply confirming what we already know. Faced with results that seem intuitive or even trivial, digital humanists hesitate New Orleans, see "Digital Humanities in Middle East Studies," IJMES 50 (2018), 103-39. The "Digital Islamic Humanities Project" at Brown University (https://islamicdh.org/) convened workshops and conferences in 2013, 2014, 2015 and 2016. In 2018, the Utrecht-based project, "Bridging the gap: Digital humanities and the Arabic-Islamic corpus" (https://sensis.sites. uu.nl/digital-humanities/), in collaboration with the Brown project, organized a conference (Amsterdam,(13)(14)(15) at the Netherlands Royal Academy of Arts and Sciences, at which an early and shorter version of this article was presented. Mention should also be made of the "Islamicate Digital Humanities Network" (https://idhn.org/), founded following the Amsterdam conference. to publish the results of their computational forays in the archive.10 For this reason it is instructive to study the history of computational scholarship in other disciplines of the humanities. In the 1990s, a palpable sense of disappointment struck scholars engaged in the computational study of western literature.11 As Anthony Kenny, an accomplished practitioner, noted in 1992, "after thirty-odd years of this kind of [computational] research" there were "embarrassingly few books and articles" that were "both (a) respected as an original scholarly contribution within their own discipline and (b) could clearly not have been done without a computer."12 idh lives under the shadow of these disappointments, and despite all the excitement we must keep looking for "evidence of value"13 in idh-including in the present study, which proposes a computer-guided investigation of Islamic jurisprudential literature. We still lack a basic digital infrastructure for the computational study of Islamic law.14 The available digital archives provide an "illusion of totality",15 but in fact their unexplicated selection criteria produce eclectic and discontinuous corpora, tilted towards certain periods and currents of Islamic jurisprudence. The search function included in open-access electronic libraries such as al-Maktaba al-shāmila or its Shiʿi counterpart, Noorlib, offers the possibility to run simple keyword searches, at best a combination of several such keyword searches. As a rule, the tools that are embedded in these repositories to recognize Arabic triliteral roots are not explained and the underlying code is not open source. Thus, the reliability of such root prediction tools is far from certain.16 None of 10  the large digital repositories is equipped to facilitate clustered searches or diachronic searches over the longue-duree, let alone network analysis, comparison of texts, topic modeling, or adequate visualizations thereof. There are no self-serve text-analysis tools, such as Voyant, that work well for Arabic or other languages of the Islamic world. Before it will be possible to study Islamic legal literature computationally, two things must be put in place: (1) a controllable, logically organized corpus that uses clear selection criteria; and (2) a computational toolbox that makes it possible to study text reuse, networks of people, places and texts, frequencies of concepts and semantic fields, distribution of topics, and more. The present study is the product of an attempt to create such an infrastructure, that is, to design a representative corpus of Islamic substantive law treatises (furūʿ al-fiqh) from the beginnings of Islamic jurisprudence in the 2nd/8th and 3rd/9th centuries to the 13th/19th century, and to analyze this corpus by using several advanced computational tools and methods.

1.1
Building the Corpus The corpus created for this study (see Fig. 1) is comprised of fifty-five books of the furūʿ al-fiqh type. All these works are comprehensive in the sense that they cover the three areas of acts of worship (ʿibādāt), transactions between human beings (muʿāmalāt), and penal law (jināyāt). For each century of the Islamic calendar, we select one book from each of the four surviving Sunni schools, as well as one Jaʿfarī book, resulting in five madhhab subcorpora. All Sunni texts in the corpus are taken from al-Maktaba al-shāmila and the electronic library islamweb; the Jaʿfarī ones come from several Shiʿi online repositories.17 For convenience, we combine the 2nd/8th and 3rd/9th centuries, for which However, a more extensive Ḥanafī furūʿ al-fiqh text from the 13th/19th century is not available digitally at the present time. The only important exception is Ibn ʿĀbidīn's (d. 1252/1836) celebrated Ḥāshiya, but in our Ḥanafī subcorpus-again this is not unproblematic-it represents the 12th/18th century.
There are other problems. By limiting Ḥanafī furūʿ al-fiqh literature to one book per century, the Ḥanafī tradition is stripped of much of its richness and complexity. By cutting a single path through a veritable jungle of texts, the subcorpus of Ḥanafī texts included in our corpus suggests a one-directional, linear development of Ḥanafī jurisprudence. The Ḥanafī textual heritage, however, is not a seamless whole, not even when the scope is restricted to furūʿ al-fiqh texts. There are different genres (from textbooks [matns] and summaries [mukhtaṣars] to commentaries [sharḥs] and super-commentaries or glosses [ḥāshiyas]) as well as multiple regional traditions within Ḥanafī furūʿ al-fiqh.20 18 Strictly speaking, the earliest Jaʿfarī work included in the corpus, al-Barqī's K. al-Maḥāsin, is not a work of fiqh, but rather a compilation of hadiths. Also, it is transmitted incompletely and lacks important sections. See EI2, s.v. "Al-Barḳī" (Charles Pellat). This special character of the K. al-Maḥāsin also explains why textmining it produces results that are different from those produced by textmining the other texts in our corpus. See below, Fig. 2.4 and passim. 19 By token-the unit of measurement that is recognizable to a computer-we mean the unbroken string of Arabic characters between two spaces in a text. For example, the chain of characters f-l-l-m-ʾ-m-n-y-n, in Arabic consists of four words: the conjunction fa-, the preposition li-, the definite article al-, and the plural noun muʾminīn. However, in our corpus these four words form a single token. Thus, it is risky to rely unconditionally on the subcorpus in order to derive synthetic insights about Ḥanafī legal thought. This risk points to a fundamental tension, perhaps unresolvable, between the approach of humanities computing and that of the traditional humanities. In the computational approach, the goal is to identify basic patterns and structures in large data sets ("devices, themes, tropes-or genres and systems"21), at the expense of specifics. By contrast, in the traditional humanities approach, scholars seek to identify possible biases of (usually) single texts and of their own reading practices, especially the danger of glossing over difference and diversity within a tradition. Hence the often-voiced injunction to use distant reading techniques, whether machine-guided or not, in combination with close reading, in an effort to inform and control these techniques-a hermeneutic circle for the digital age. In such circumstances, is it wise to advocate for a computational, data-driven analysis of Islamic legal literature? What questions can digital humanities meaningfully ask about Islamic law? We propose here that we should move forward by trial and error. The balance between gains and risks can only be determined at the end of this exploration, perhaps only after several more studies such as the present one have been conducted.22 Our corpus includes approximately 49,300,000 tokens23 and, assuming a multiplicator of 1.5,24 around 74,000,000 words. It would be hasty, perhaps, to speak of this corpus in terms of "big data", but in comparison with, say, the corpus of digitized classical Greek and Roman texts assembled in the well-known Perseus Digital Library (68,925,971 words, according to the library's website),25 it is not insignificant.
The five subcorpora include: (1) Ḥanafī texts: 13,006,501 tokens; (2) Shāfiʿī texts: 10,495,516 tokens; (3) Mālikī texts: 9,224,967 tokens; (4) Ḥanbalī texts: . These numbers suggest that, in terms of words written, the Ḥanafīs are the most prolix of the five schools, and the Ḥanbalīs are the least prolix. This inference is based on the assumption that the corpus is representative with respect to the length of texts, one of the selection criteria of our corpus. The total number of tokens per century in the corpus is as shown in Table 1: Again assuming the representativeness of the corpus, we infer from these numbers that the length of furūʿ al-fiqh texts increased substantially in the 5th/11th century. The texts of the following centuries, as a rule, do not exceed in length the 5th/11th-century texts. Only in the 12th/18th century do we find texts of greater length than those written in the 5th/11th century, including the three aforementioned texts, that is, al-Baḥrānī's al-Ḥadāqʾiq al-nāḍira and the two Ḥāshiyas of Ibn ʿĀbidīn and al-Jamal.

1.2
Analyzing the Corpus After eliminating paratext (front and end matter, and footnotes) from the texts in the corpus, and annotating major divisions (parts [kitābs], chapters [bābs] and sections [faṣls]), we adapted BlackLab, a text search engine developed at the Dutch Language Institute (inl),26 to our corpus. The resulting BlackLab Arabic Digital Humanities (henceforth: BlackLab adh, http://arabic-dh.hum. uu.nl/corpus-frontend/) allows for fast, complex searches of the corpus, by word(s), stem(s) or root(s), while also making it possible to filter and group searches by century, geographic region, and law school. These features make it easy, for example, to search for the frequencies of specific terms, ratios, or ontologies of terms over the centuries, or to compare how law schools differ or overlap in the way they use certain concepts.
Counting the frequency with which a specific term, or a string or cluster of terms, appears in a text, of course, does not tell us how this term is used, that is, whether it is used positively or negatively, or in a technical, non-technical, argumentative, or regurgitative manner. The fact, for example, that one jurist frequently uses the term ijtihād ("personal juristic reasoning") does not prove that he approves the concept; it is possible that he is arguing polemically against the use of ijtihād. In the examples discussed below, we reflect on such ambiguities and propose ways to overcome them. Beyond the frequency-based analysis powered by BlackLab adh, we also employ a number of other techniques, such as text-reuse network analysis based on plain-text annotations and html tags, word embedding, parsimonious word clouds, and topic modeling.
The fact that such techniques are largely untested on the Arabic legal corpus means that we should be modest about the results of this study, which is exploratory-a point that bears emphasis. We do not aspire to develop a new data-driven, digital grand theory of Islamic jurisprudence. As Elias Muhanna reminds us, "instead of seeking out the latest digital methodologies and tools in the hope that they will unlock the secrets of our archives, scholars would be better served by asking the same questions they would in an analogue project."27 With due modesty, therefore, we seek to demonstrate the usefulness of humanities computing for the study of Islamic law by digitally analyzing the corpus to generate quantitative answers to the following three, basic questions. What role is played by the Qurʾān in Islamic jurisprudence, and how do the law schools differ in the way they rely on and refer to it (see below, 2.)? What kind of normative categorizations, or aḥkām, are salient in furūʿ al-fiqh, and how restrictive or permissive is furūʿ al-fiqh, considered both as a whole and per law school (see below, 3.)? What are the fundamental concerns and topics of furūʿ al-fiqh, according to the entire corpus and each of the law schools, over the longue-duree (see below, 4.)?
Not in all instances, however, do we ask the "same questions" asked by previous scholars, as Muhanna proposes. The ability of idh to digest large amounts of texts that no single scholar could ever hope to master, in our view, does enable us to think of and to ask questions that previously were unanswerable, and it would be a missed opportunity not to do so.28 Ideally, idh will help us to move beyond traditional frameworks and to read our corpus, in Anver Emon's apt phrasing, "openly and freely but with attention to the text and its limits… [without] the hegemonic hoof of philology… stamping one's back".29 In this spirit, we seek to provide a corrective to, or to confirm, some of the claims made in the scholarly literature on Islamic law.30

The Qurʾān in Muslim Jurisprudence
Scholars usually consider the Qurʾān and the hadith to be the two textual sources of Islamic law, a legal system that Bernard Weiss, one of its most well-known 28 Some critics contend that the only way forward for digital humanities is to abandon "traditional objects of study" and to focus on examining "large amounts of simple linguistic features". See Olsen, "Signs, Symbols, and Discourses," 309. interpreters, has described as having a "textualist bent".31 Scholarly discussion has largely revolved around the question of the origin and autenticity of these sources, and around the hermeneutical principles that scholars of uṣūl al-fiqh developed to interpret them.32 By comparison, little attention has been devoted to analyzing the reliance of Muslim jurists on the Qurʾān and the hadith in quantitative terms. While Western scholars such as Alexander Knysh occasionally highlight that Muslim jurists "[p]eriodically… made attempts to restrict the discretionary power of the judges by inviting them to 'return' to the letter of the Qurʾān,"33 Salafī authors such as Nāṣir al-Dīn al-Albānī (d. 1999) criticize madhhab traditionalists for relying on the Qurʾān too little, or only indirectly, allowing the opinions of later madhhab authorities to accumulate, layer after layer, on top of, and eventually covering up, Islamic scripture.34 Can we test such global intuitions on the basis of our data-driven, quantitative approach? For example, were some schools of law more likely than others to refer to the Qurʾān?35 Do certain schools of law privilege certain parts or verses of the Qurʾān? Are there significant differences between individual jurists in terms of how they rely on the Qurʾān? A statistical analysis of our corpus shows that Shāfiʿī authors refer to the Qurʾān most frequently, in 10,619 citations.36 That is, a Qurʾānic verse is cited after an average of every 982 tokens or an average of 4.9 pages in a printed edition of a Shāfiʿī work.37 Shāfiʿīs are followed by Jaʿfarīs (9,133 citations, after 31 See Weiss,Spirit, For the scholarly concern with the origin and authenticity of the two sources, see  (2017), 211-53. 35 In the following, we limit ourselves to an analysis of the Qurʾān's footprint in fiqh, to the exclusion of hadith. Until the words of the Prophet are stringently annotated in electronic corpora, it will be difficult to disambiguate them from those of the Companions or from other historical sayings and anecdotes. Hence, a direct text re-use detection via conventional search methods (e.g. fuzzy search) will result in numerous false positives. An element of our present method, that is, the detection of text re-use based on html tags extracted from popular digital readers, may assist in creating a Deep Learning training set for a more stable entity extraction (in this case hadith). every 1247 tokens or 6.2 pages), Ḥanbalīs (3,888 citations, after every 1,293 tokens or 6.4 pages), Ḥanafīs (8,908 citations, after every 1,454 tokens or 7.2 pages) and Mālikīs (4,726 citations, after every 1,946 tokens or 9.7 pages).38 It is commonly asserted that Ḥanafī jurists are lax in their reliance on the Qurʾān, but in our corpus it is the Mālikī authors who rely the least on Qurʾānic evidence. The Mālikī tendency not to refer to the Qurʾān may be related to the centrality of the figure of Mālik b. Anas (d. 179/795)  In addition to asking ourselves how often jurists refer to the Qurʾān, we also should ask which Qurʾānic verses are cited by Muslim jurists and in which areas of Islamic jurisprudence this happens. In Tables 2 and 3, we list the most frequently cited Qurʾān verses in our corpus. Sūra and verse numbers are followed, in brackets, by the total number of quotations of the verse and, in square brackets, the total number of texts that reference a verse. First, we give the most frequently cited verses in the entire corpus, all five law schools combined:43  Table 2 sheds light on the topics with regard to which the jurists in our corpus most frequently invoked the Qurʾān, that is, the topics for which they felt Qurʾānic evidence was relevant. Next to the verse in Islamic contract law (Q 2:282), Qurʾānic verses relating to ritual practice (pilgrimage and prayer) and family law (marriage, divorce, inheritance) figure prominently. Given that worship, as noted by Bernard Weiss, "is not a realm in which one expects to find the accent on human freedom",44 it makes sense that the jurists rely on the Qurʾān in this area. It is particularly in the area of the ʿibādāt, ritual actions, that Muslim jurists treat God's logic as inscrutable, as Kevin Reinhart has shown.45 More broadly, tawqīf, that is, reliance on revelation in formulating the law, is declared essential by Muslim jurists in areas that are "non-rational", or to use the language of fiqh, "divinely imposed" (muqaddar). As Christian Lange has demonstrated, the schools of law define the scope of the "divinely imposed norms (muqaddarāt)" in Islamic law differently; generally speaking, however, the muqaddarāt are understood to refer to the ʿibādāt as well as to all "numerical norms" (the so-called maqādīr, e.g. fasting ten days to make up for not making an offering during ḥājj, see Q 2:196; the permission to divorce twice by repudiation, see Q 2:229; the ratios according to which inheritance is to be divided, see Q 4:11).46 The fact that the verses in Table 2 touch so closely on the area of the muqaddarāt and maqādīr suggests that Muslim jurists understood that not all areas of the law are amenable to human reasoning to the same degree.
That this holds true not only for the corpus as a whole but also for individual schools is demonstrated by Table 3, which shows the ten most frequently cited verses in each of the five schools.
be linked unequivocally to a single Qurʾānic verse is counted as one instance of a Qurʾān quotation. In cases in which a citation can be linked to multiple verses, each verse is counted separately. For example, the expression "I am better than he" (anā khayrun minhu) can be linked to two verses in the Qurʾān: 7:12 and 38:76. Accordingly, if a scholar cites this expression, two verses are counted. Citations shorter than three tokens are ignored because they skew the final numbers.  Table 3 points to a large overlap of verses across all five schools, including the Jaʿfarīs. The few tangible differences are indicated by asterisks, which tag verses that are among the ten most frequently cited verses of one law school but not of the other schools. Ḥanafīs, for example, show a pronounced interest in Q 2:233, a verse with rules relating to suckling and weaning, while Shāfiʿīs pay disproportional attention to Q 2:229, a verse regulating divorce.
Next, we visualize the results of our analysis in an interactive network with the help of an open access online tool called "Footprinter" (https://quranin-fiqh.hum.uu.nl/). In this network, grey dots represent Qurʾānic verses; the more often a verse is cited by different authors, the closer to the center of the network is the dot. Colored nodes represent texts in our furūʿ al-fiqh corpus, with Ḥanafī texts appearing in orange, Shāfiʿī texts in light blue, Ḥanbalī texts in purple, Mālikī texts in green, and Jaʿfarī texts in olive ( Fig. 2.1). Zooming in allows for identification of individual texts ( Fig. 2.2). Qurʾānic verses can be displayed in isolation from other verses (Fig. 2.3). Gold dots represent verses that are only cited by a single author; they cluster, in the form of islands, around the margin of the network (Fig. 2.4). Several other selection criteria and filters can be applied. A built-in reader allows users to follow verses back to the texts in which they are cited. There are 2,954 dots (verses) in the network, which means that almost half (47%) of all Qurʾānic verses (6,236 verses in the standard Eygptian edition) are quoted in our corpus.47 This is significant because, in theory, only 350 to  . This phenomenon prompts the observation that furūʿ al-fiqh texts and Aḥkām al-Qurʾān texts are "genealogical": authors of Aḥkām al-Qurʾān rely on earlier works in the genre rather than on furūʿ al-fiqh texts to determine the body of Qurʾān verses that merit attention. Conversely, furūʿ al-fiqh texts omit Qurʾānic verses that are stock-in-trade for authors of Aḥkām al-Qurʾān texts.
Returning to our network, texts that share numerous Qurʾānic citations with other texts in the corpus, and thus can be said to rely on a cross-madhhab core of Qurʾānic verses, cluster around the center: these include al-Shāfiʿī's (d.

al-Aṣl).
A different case is the K. al-Maḥāsin of al-Barqī (d. 274/887) (Fig. 2.4), also fairly short and also situated at the fringe of the network, but which refers to the Qurʾān 426 times, a relatively large number. The peripheral position of this text results from the fact that it regularly refers to Qurʾānic verses not cited by other texts in the network. This is indicated by the large island of yellow dots connected to al-Barqī's text. When one 'visits' al-Barqī's island, by following up on the verses behind the yellow dots and reading the passages in which they are embedded, one observes that al-Barqī's anomalous status in the network results from his repeated references to the āl al-bayt. A case in point is al-Barqī's exclusive citation of Q 26:100, in which the unbelievers say on the Day of Judgment that "there are no intercessors [shāfiʿīn] for us." Al-Barqī here relates the view of al-Ḥusayn, the grandson of the Prophet, that the "intercessors" are "the Imams (al-aʾimma) and the righteous amongst the believers".55 Also noteworthy on the periphery of the network are long texts with a surprisingly low number of Qurʾān citations, for example, al-Muḥaqqiq al-Ḥillī's (d. 676/1277) Sharāʾiʿ al-islām, "one of the most influential Twelver Shiʿite legal compendia",56 which quotes the Qurʾān a mere sixteen times. As al-Ḥillī posits in a work on legal hermeneutics (uṣūl al-fiqh), jurists discover laws by relying on "theoretical considerations that are in most cases not derived from the exoteric meanings of the [revealed] texts" (iʿtibārāt naẓariyya laysat mustafāda min ẓawāhir al-nuṣūṣ fī 'l-akthar).57 Quantitative analysis shows that al-Ḥillī did not hesitate to follow this hermeneutical maxim in his own legal reasoning.

Ḥalāl and ḥarām
We begin our machine-supported, quantitative exploration of Islamic legal deontology with the two most basic qualifications of actions in Islamic law, lawful (ḥalāl) and unlawful (ḥarām). It is no doubt true, as Yūsuf al-Qaraḍāwī asserts, that "[i]n Islam, the sphere of prohibited things is very small, while that of permissible things is extremely vast"58-even if the same may be said about most, if not all moral-legal systems in human societies. And yet, certain legal systems are thought to be more "liberal" than others. "Muslim jurists," writes Bernard Weiss, "acknowledge that there is a large sphere in which human beings must be able to conduct their own affairs so as to achieve maximal advantage for themselves… [b]ut in all human social life, freedom must have its limits, and Islamic law stands in contrast to the liberalism of the West in the drawing of these limits."59 Is Weiss treating Islam and its legal tradition as the "external other" of Western liberalism, to echo Joseph Massad?60 As Massad argues, while Western Orientalists have tended to characterize Islamic law as uniquely illiberal, restricting freedom through irrational prohibitions and thereby undermining the ability of Muslims to modernize their societies, Western liberalism has long sought to "transform" Islamic law and bring it in line with Western liberal sensibilities.61 Mohammad Fadel suggests that scholars of Islamic law should "transcend the limitations of the Islam/liberalism dichotomy", and that they should study the "moral language" of each of the two formations more closely.62 A machineguided analysis of furūʿ al-fiqh, we suggest, has something to offer in this search for the "moral language" of the fiqh tradition. Here, we compute the ratio between the two concepts of lawful and unlawful in our furūʿ al-fiqh corpus. Such ratios can be determined for each of the subcorpora of the law schools as well as for the entire corpus over time. In Figure 3, we represent the relative frequency of usage of the two terms in each of the five schools as vertical bars in which values are calibrated to a common scale, and in which the percentage of references to ḥalāl is stacked on the percentage of references to ḥarām.63 This representation is not based on a simple frequency search of the two terms. To achieve more balanced results, we extend our search to include not only a wild card search for *ḥarām*, but also *y-ḥ-r-m and *t-ḥ-r-y-m*, and we substract instances of *lā y-ḥ-r-m, *lam y-ḥ-r-m, and *iḥrām*.64 Similarly, we search for *ḥalāl* in combination with *t-ḥ-l-y-l* and *y-ḥ-l, subtracting instances of *lā y-ḥ-l and *lam y-ḥ-l. We call the resulting search clusters *ḥarām*+ and *ḥalāl*+.65 While the bar diagram does not show how "liberal" or "illiberal" Islamic law is in comparison to other legal systems, it suggests that Jaʿfarīs and Ḥanbalīs are slightly more likely to use the prohibition *ḥarām*+, and that in this sense they are more "illiberal" than the other three madhhabs. 76.8% of the combined number of *ḥarām*+ and *ḥalāl*+ in the Jaʿfarī subcorpus refer to *ḥarām*+, while 23.2% refer to *ḥalāl*+ (Ḥanbalīs: 74% vs 26%). Then follow, in descending order, Shāfiʿīs (72.9% vs 27.1%), Mālikīs (68.3% vs 21.7%), and Ḥanafīs (66.5 vs 33.5%). Paul Powers has observed that scholars of Islamic law have a "careless tendency… to imply that Ḥanafīs are 'liberal' and Ḥanbalīs are 'conservative'",66 and that there is, to date, no "systematic historical study of such 64 There are 178 instances in the entire corpus of *laysa bi-ḥarām* and *ghayr ḥarām*. This is less than 1% of all 26,444 instances of *ḥarām*, and the same holds for *laysa bi-ḥalāl* and *ghayr ḥalāl*, making the search for other forms of negation (e.g., lā ḥarām) redundant. 65 That is: *ḥarām*+ = *ḥarām* + *t-ḥ-r-y-m* + *y-ḥ-r-m -*lā y-ḥ-r-m -*lam y-ḥ-r-m -*iḥrām*; *ḥalāl*+ = *ḥalāl* + *t-ḥ-l-y-l* and *y-ḥ-l -*lā y-ḥ-l -*lam y-ḥ-l-m. 66 Paul R. Powers, "The Schools of Law," in The Ashgate Companion to Islamic Law, 49. See, for example, Raymond Charles, Le droit musulman (Paris: Presses Universitaires de France, 1956), 27, who casually notes that the Ḥanafī school is regarded as "the most liberal" (le plus liberal), while the Ḥanbalī school is "the strictest" (le plus strict). differences", which means that "no sweeping characterizations of the doctrinal tone of individual madhhabs are warranted."67 Our digital, distant-reading approach is a first step in the direction of such a systematic historical study.
The differences between the schools in our model are not, we acknowledge, significant from a strictly statistical point of view. But they do show tendencies that appear to support the scholarly intuition about Ḥanafī 'liberalism' and Ḥanbalī 'conservatism' . To respond to the need, highlighted by Powers, to pay attention to the "doctrinal tone" of the schools as it developed over time, in Figure 4 the *ḥarām*+ /*ḥalāl*+ ratio is again shown in vertical bars that are calibrated to a common scale, but arranged according to century.68 Figure 4 shows an increase in the use of *ḥarām*+ in proportion to *ḥalāl*+ from the 3rd/9th century until the 5th/11th century, at which time many of the major 'classical' comprehensive fiqh texts appear, such as the Mabsūṭs of al-Sarakhsī and al-Ṭūsī, the Kāfī of Ibn ʿAbd al-Barr, and the Ḥāwī al-kabīr of al-Māwardī. A second, smaller increase occurs between the 5th/11th century and the 8th/14th century, the century of al-Bābartī, Khalīl b. Isḥāq, al-Nawawī, Muḥammad Ibn Mufliḥ, and al-Muḥaqqiq al-Ḥillī. After the 8th/14th century, the ratio remains stable, approximately 3:1. It is again figure 4 Ratios of of *ḥarām*+/*ḥalāl*+ in furūʿ al-fiqh over the centuries possible to challenge the statistical significance, or "evidence of value", of our findings. Be that as it may, Figure 4 illustrates that computer-supported distant reading puts us in a position to think about longue-duree dynamics in furūʿ al-fiqh that cannot be seen, or even thought about, by the usual close reading of the sources: a "haramization" of the law from the formative to the classical period, and a stability in the discourse on lawful and unlawful in the postclassical period up to the modern period. In the 20th and 21st centuries-a period that does not fall within the scope of this study-the arc of ḥarām and ḥalāl seems to bend towards "halalization", as has been argued with regard to certain consumption-driven and affluent Muslim societies of the late 20th and early 21st centuries.69

The Five Qualifications (al-aḥkām al-khamsa)
In addition to the distinction between ḥarām and ḥalāl, Islamic legal deontology operates with five basic normative qualifications (aḥkām taklīfiyya): obligatory (wājib), recommended (mandūb), neutral (mubāḥ), disapproved (makrūh), and forbidden (ḥarām). "The intermediate categories," writes Mohammad Hashim Kamali, "consist essentially of options that offer scope for personal freedom." He concludes that the "scope of liberty [in Islamic law] is thus much wider than that of wājib and ḥarām."70 A machine-supported analysis of our corpus, we argue, can establish how wide this scope is, quantitatively speaking.
A complication results from the fact that over the course of the centuries, Muslim jurists have used a number of synonyms for each of these five qualifications, as well as other subcategories.71 Thus, Ḥanafīs distinguish between two categories of the obligatory: absolutely obligatory (farḍ) and binding (wājib).72 69  Mālikīs distinguish between recommended (mandūb) and (good) practice (sunna), while for the other schools, the two terms are largely synonymous.73 In order to account for this terminological variety, we trained a word embedding tool on our corpus to identify the closest semantic neighbors of the five terms.74 Thus, we base the following analysis on the following pairs of terms: *wājib*+*farḍ* (henceforth: *wājib*+); *mandūb*+*mustaḥabb* (henceforth: *mandūb*+); *mubāḥ*+*ḥalāl* (henceforth: *mubāḥ*+); *makrūh*+*qabīḥ* (henceforth: *makrūh*+); *ḥarām*+*maḥẓūr* (henceforth: *ḥarām*+). The following pie chart (Fig. 5) shows the ratio between these five pairs in the entire corpus, not counting the most common negations (that is, instances of any of the ten terms preceded by *ghayr* or *laysa bi-*).75 More than half (54.3%) of all qualifications in the corpus are *wājib*+, about one-fifth (22%) *ḥarām*+, almost one-eighth (12.3%) *mubāḥ*+, and a little more than one-tenth *makrūh*+ and *mandūb*+ combined (5.3% and 6.1%, respectively). The individual law schools follow this pattern, with minor variations. The two ends of the spectrum are occupied by the Ḥanbalī and the Mālikī schools ( Fig. 6.1 and Fig. 6.2): Ḥanbalīs have the smallest sum total of middle categories, roughly one-fifth of all qualifications (*mandūb*+: 3.8%; *mubāḥ*+: 14.1%; *makrūh*+: 2.9%; total: 20.8%), while Mālikīs have the largest middle category, more than a quarter of all qualifications (*mandūb*+: 7.2%; *mubāḥ*+: 11.2%; *makrūh*+: 6.8%; total: 26.4%). We detect here an echo of the notion that Ḥanbalīs, as Noel Coulson put it, embrace a certain "moralist attitude", that is, they divide human actions starkly into obligatory and forbidden, while the other schools, especially Mālikīs and Ḥanafīs, have a "legally formalist attitude" that favors use of a broader range of categories.76 As for Ḥanbalīs, it should be noted that their perceived strictness is balanced by their greater willingness, comparatively speaking, to argue in terms of "dispensations" or "alleviations" (rukhaṣ, sg. rukhṣa), a category of legal norms that, as Goldziher remarked, is "appended" to the five qualifications.77 In the Ḥanbalī subcorpus, the term *r-kh-ṣ* appears once every 6,345 tokens, whereas it is less commonly used by the Shāfiʿīs (once every 7,694 tokens), Mālikīs (once every 7,952 tokens), Ḥanafīs (once every 9,801 tokens) and Jaʿfarīs (once every 11,160 tokens).78 To return to the question we posed at the beginning of this section, our analysis suggests that, pace Kamali, it is not certain that the scope of the intermediate categories is "much wider" in Islamic law than the scope occupied by the two categories of ḥarām and wājib. Kamali, it should be said, is not alone in his view of the relationship between the five qualifications. Scholars frequently state that the moral sphere (demarcated by the terms makrūh and mandūb) and the legal sphere (the domain of wājib and ḥarām) are seamlessly connected in Islamic law, and that both spheres are equally important for Muslim jurists. Bernard Weiss, for example, opines that "[i]t is important always to bear in mind that the Shariʿa is as much concerned with recommending and disapproving as it is with prescribing and forbidding."79 Ahmad Alkhamees states that "Sharīʿa pays similar attention to recommended and disapproved acts as to prescribed and prohibited acts".80 Wael Hallaq, finally, finds that the theological and eschatological nature of the intermediate categories "does not relegate [Hallaq's emphasis] them to a category below, and thus outside, the law," and that "[m]eshing the moral with the legal, these norms were subject to a great deal of articulation and discussion."81 Computational quantification of 76 See Coulson,Conflicts and Tensions,86. In the case of the Mālikīs, the *mandūb*+ domain would likely increase even more if the category of sunna were also taken into account, something our machine-driven approach does not enable us to do. It would be interesting to conduct a quantitative study of the differences in the use of the five qualifications in the three fields of the law (ʿibādāt, muʿāmalāt, jināyāt the five legal qualifications provides a corrective to this view, suggesting that the jurists were first and foremost interested in determining legal prescriptions and prohibitions, and only secondarily, and at some distance, in voicing moral approval or disapproval.

Word Clouds
In the final section of this study we use a computational, distant-reading approach in order to identify the salient topics in our furūʿ al-fiqh corpus. In digital humanities, word clouds are a popular means to visualize, in one image, the most-frequently used words (or, as in our case: tokens) in a given corpus. Because ordinary, frequency-based word clouds (fwc s) bring to the fore oftenused stopwords and particles, the use of filters is imperative. In Figure 7, we filter out all particles, cardinal and ordinal numbers, and verbs; we only include adjectives, nouns and proper names, in both prefixed and suffixed forms.
The two central terms in the corpus are "prayer" (al-ṣalāt) and "property" (al-māl).82 The layer around this core includes "messenger" (rasūl), "the Prophet" (al-nabī), "sale" (al-bayʿ), "contract" (al-ʿaqd), "Muslim individual/ slave" (al-ʿabd), "ruler/leader" (al-imām), and the names, "al-Shāfiʿī" and "Mālik".83 The fwc confirms a broadly shared understanding of Islamic jurisprudence; it can hardly be said to tell us anything we do not know yet, even if it does so in one, rather striking vignette.84 More surprising and fertile insights come to the fore in so-called parsimonious word clouds (pwc s).85 Unlike fwc s, pwc s place the terminology of a given subcorpus against the background of the entire corpus. The more peculiar a term is to a subcorpus in comparison with the other subcorpora, the larger it appears in the pwc.86 In other words, pwc s highlight what makes a subcorpus different from other subcorpora in the corpus. They bring into relief the specific tone or character of a subcorpus. pwc s do not highlight stopwords or particles, except if a subcorpus manifests a special predilection for such a stopword or particle. For example, the pwc of the Ḥanbalī subcorpus ( Fig. 8.1) shows that Ḥanbalī jurists refer to Aḥmad b. Ḥanbal more frequently than jurists of the other schools, and it highlights the central position of al-Khiraqī (d. 334/945-6), author of the first short summary (mukhtaṣar) of Ḥanbalī law.87 Two other peculiarities of the Ḥanbalī subcorpus are the use of the term al-riʿāya and the use of wa-ʿanhu ("and/also from him [is related]"). Regarding the first peculiarity, a close reading of the relevant passages in the Ḥanbalī subcorpus reveals that the term al-riʿāya refers to the title of a work, al- Ibn Ḥamdān is remembered for teaching that by his time, fully independent, or "absolute" (muṭlaq) mujtahids had disappeared from the lands of Islam.88 This connects him to the second peculiarity of the Ḥanbali pwc, the frequent use of wa-ʿanhu, which indicates the propensity of Ḥanbalī jurists, compared to those of the other schools, to transmit the received opinions of earlier authorities, especially those of Aḥmad b. Ḥanbal.
The frequent use the term wa-ʿanhu is characteristic of two Ḥanbalī authors in particular, the aforementioned two Ibn Mufliḥs: Muḥammad and his great-grandson, Ibrāhīm. Together, their texts account for more than 90% of all instances of wa-ʿanhu in the Ḥanbalī subcorpus, even though, in terms of the size of their texts, they make up only approximately 35%. More is at stake here than a curious stylistic preference shared by two authors hailing from the same Damascene dynasty of Ḥanbalī scholars. Muḥammad Ibn Mufliḥ was a student of Ibn Taymiyya (d. 728/1328) and also studied with the traditionists al-Dhahabī (d. 748/1438(d. 748/ or 753/1352(d. 748/ -3) and al-Mizzī (d. 742/1341.89 According to George Makdisi, the K. al-Furūʿ by Muḥammad Ibn Mufliḥ, "one of the most prolific writers of the Ḥanbalī school of his period," is "one of the most important Ḥanbalī works for the establishment of the true legal doctrine of Aḥmad b. Ḥanbal".90 His great-grandson Ibrāhīm related that his grandfather was "a virtuous expert, Finally, the three major terms in the Jaʿfarī pwc ( Fig. 8.4) refer to characteristically Jaʿfarī ways of framing an argument: wa-ālihi is part of the taṣliya formula used in Shiʿi texts (ṣallā ʿalayhi wa-ālihi, "God bless him and his family"); al-akhbār, used in reference to the Jaʿfarī hadith corpus, occurs in phrases such as fī baʿḍ al-akhbār ("according to certain traditions"); al-aṣḥāb refers to "companions" or "adherents" of the Jaʿfarī law school, invoked anonymously in phrases such as ka-mā qāla baʿḍ al-aṣḥāb ("as a certain companion/a certain number of companions said"). As revealed by a close-reading check in BlackLab adh, in earlier Jaʿfarī texts there are almost no instances of authors referring to "a certain tradition" or to "a certain companion" to buttress an argument. Such phrases become commonplace in Jaʿfarī law only in the 10th/16th and the following centuries, starting with al-Shahīd al-Thānī's (d. 965/1557) Masālik al-afhām. The 10th/16th century witnessed the emergence of the Safavid state and its patronage of Jaʿfarī law, and the fact that Jaʿfarī jurists, from this period onwards, invoke their collective tradition of legal scholarship to frame an argument demonstrates their confidence in their school's institutional strength.

4.2
Topic Modeling Another computational bird's eye approach to the corpus is provided by topic modeling, a common technique in digital text mining.95 The term refers to the application of a statistical model to a corpus, divided into segments, in order to identify for each segment the salient set of words which, together, form a "topic".96 In Arabic corpora, topic modeling does not work well with words because Arabic words frequently appear with many morphological variations. Topic modeling based on roots is promising, but root recognition for Arabic, as mentioned above, remains a challenge.97 Here, we choose to model topics on the basis of stems rather than words or roots. For this purpose, we use a stemmer that removes prefixes and suffixes from words (or rather, what it identifies as such, not always successfully), to the exclusion of infixes.98 The twenty most salient topics in the corpus are as shown in Table 4: 95 In this essay, we use Latent Dirichlet Allocation (lda) to arive at topics. The lda model arrives at a predefined number of topics. For each topic, it assigns a weight to each possible word, denoting its importance for the topic. Per topic, the  -k-r, [alif ]-w-l, [alif ]-dh, f-l, sh-r-ḥ, ẓ-[alif ]h-r, w-l, k-l-[allif ]-m, b-kh-l-[alif ]-f, sh-y-kh 2 legal reasoning: general; akhbār s-l- [alif ]-m, h-dh, [alif ]-b, [alif ]-kh-b-[alif ]-r, r-w-[alif ], sh-y-kh, dh-k-r, ẓ-[alif ]-h-r, k-l-[alif ] a-w-l, ʿ-d-m, m-ṭ-l-q, [alif ]-j-m-[alif ]-ʿ, [alif ]-ṣ-l, th-[alif ]-n, w-l, ṣ-ḥ-y-ḥ, [alif ]-m 4 legal reasoning: Prophetic hadith and early authories ṣ-l, [alif ]-b, r-s-w-l, ʿ-m-r, m-[alif ]-l-k, ḥ-d-y-th, n-b, sh-[alif ]-f-ʿ, r-j-l, [alif ]-b
Topics 4 (legal reasoning: Prophetic hadith and early authorities; dark brown) and 3 (legal reasoning: general; khilāf; brown) are salient in the early sections of each work. In these sections, the vocabulary referring to Prophetic tradition and to transmitted knowledge in general forms the dominant topic. As is well known, furūʿ al-fiqh texts begin with sections on ritual law. It is in these sections, as Figures 9.1 to 9.5 demonstrate, that the authority of the Prophet in these matters was especially important to the jurists.
The horizontal bars feature green sections somewhere between the first and the fourth quarter, a phenomenon that points to the importance of private law (topics 15-17) in our texts, as compared to, for example, ritual and public law.
Also noteworthy is that private law issues appear to be discussed in the same sequence in all five texts, from slaves (topic 17; dark green), to estates (topic 16; grass green), to sales (topic 15; light green). Public law (topics 19-20; light purple and dark purple) unsurprisingly occupies a position near the end of our texts. The intersection of public law topics with ritual law (topic 10: oaths; fasting; expiation; powder blue) is noteworthy, suggesting that jurists regularly thought about punishment in terms of expiation.100 In al-Sarakhsī's al-Mabsūṭ, public law topics are evenly distributed across the entire text, which confirms our observation made in the previous section that al-Sarakhsī is an author with an above-average interest in public law and its institutions-according to his biographers, he dictated most parts of his K. al-Mabsūṭ to students while in prison (he spent a total of fourteen years in captivity).101 Finally, let us note 100 As In addition to plotting the topics of individual texts, we here present combined plots of topics according to all five schools. In Figure 10, the normalized vertical bars indicate the relative weight of topics in the five subcorpora.
Certain peculiarities of individual law schools are visible here. Shāfiʿīs, for example, devote an unusual amount of attention to topic 18 (procedural law; red). If we pursue this matter to the level of individual texts in our digital corpus,102 we find that this peculiarity appears first in the three texts of Ibn  is a predilection for procedure among the Shāfiʿī authors in our corpus, it is especially visible in the texts of Egyptian Shāfiʿī jurists of the early Ottoman era. In addition to the Tuḥfa and the Nihāya, two highly influential commentaries on the Minhāj al-ṭālibīn of al-Nawawī (d. 676/1277),104 Ibn Ḥajar al-Haytamī and al-Ramlī are famous for attaching their names to vast collections of fatwās. This demonstrates their commitment to the practical application of legal doctrine and thus may explain their interest in procedural questions. On the one hand, the two authors were continuing the interest of al-Nawawī's Minhāj al-ṭālibīn in questions of procedure. Al-Nawawī devotes three separate chapters to, respectively, the office of the judge (k. al-qaḍāʾ), witnessing (k. al-shahādāt) and legal claims and proofs (k. al-daʿwā wa'l-bayyināt), some thirty pages (out of 540 pages) in the printed editions of his work.105 On the other hand, there is a noticeable surge of topic 18 in Ibn Ḥajar al-Haytamī and al-Ramlī's texts, and later, in that of al-Jamal. We should note that Ibn Ḥajar al-Haytamī and al-Ramlī, as well as their teacher al-Anṣārī, wrote during a period of political and legal insecurity, brought about by the Mamluk-Ottoman war (890-923/1485-1517) and its aftermath.106 They witnessed, in the words of Leslie Peirce, the "integration of the court[s] into an empire-wide legal system, and a program of legal reform that was being scripted in Istanbul."107 As a result of this process, Ḥanafī legal doctrine and Ḥanafī judges were granted precedence over the doctrines and judges of the other schools.108 One of the more contentious issues dividing Ḥanafīs and Shāfiʿīs was procedural law. Shāfiʿī jurists, unless they 'converted' to the Ḥanafī madhhab,109 experienced a loss of control over the judicial process.
conclusions that defy easy summary. And yet, if we wish to claim, as we do, that "the tools are here", we cannot avoid the question that perennially plagues digital humanists: "what about results?"111 Let us recapitulate. In part 2 of our study, we found that, contrary to common wisdom, Ḥanafīs do not rely on the Qurʾān less than the other law schools. If anything, it is Mālikīs who do so. Shāfiʿīs, by contrast, refer to Qurʾānic evidence most frequently of all the five law schools. We further observed a general preference of Muslim jurists to quote verses from the Qurʾān that relate to ritual worship (ʿibāda) as well as to numerically defined norms, especially in the area of inheritance, marriage, and divorce. We visualized Qurʾān reliance of the law schools in an interactive network; this enabled us to appreciate the multifunctional "footprint" of the Qurʾān in Islamic law, that is, that the jurists in our corpus are concerned with far more than only the verses that are immediately relevant in a strictly legal sense. The network also helped us to identify texts that build upon a central repertoire of Qurʾānic verses shared across the entire furūʿ al-fiqh tradition, for example, al-Shāfiʿī's K. al-Umm, al-Kāsānī's Badāʾiʿ al-ṣanāʾiʿ and Ibn Idrīs' al-Sarāʾir. We also identified texts that are conspicuously 'unorthodox' in their use of Qurʾānic evidence-whether because they largely ignore the Qurʾān (e.g., al-Ḥillī's Sharāʾiʿ al-islām) or because they rely on a group of verses ignored by other authors (e.g., al-Barqī's al-Maḥāsin).
In part 3 of our study we examined the distribution of normative qualifications (aḥkām) in our furūʿ al-fiqh corpus, as well as in the madhhab subcorpora. In search of "moral language" (Fadel) and "doctrinal tone" (Powers) we examined the ḥalāl/ḥarām ratio across the five law schools. Although margins are small, we found that in quantitative terms, Jaʿfarīs and Ḥanbalīs tend towards the language of ḥarām, while Ḥanafīs and Mālikīs tend towards the language of ḥalāl. Our machine-guided analysis of the corpus further suggested a gradual process of "haramization": the ḥalāl/ḥarām ratio slowly shifts in favor of ḥarām, across all five law schools, up to the 8th/14th century, after which it remains stable. As regards the five normative qualifications (al-aḥkām al-khamsa), we found that the Mālikīs give the widest scope to the middle categories (mandūb/mustaḥabb, mubāḥ/ḥalāl and makrūh/qabīḥ). Both in the Mālikī subcorpus and in the corpus as a whole, however, the middle, "moral" categories are outnumbered by the outer categories (wājib/farḍ and ḥarām/ maḥẓūr)-a finding that casts doubt on the repeated assertions in the scholarly literature that Islamic law is legal and moral in equal measure.
Our examination of the topical distribution in our corpus, in part 4 of our study, demonstrated the centrality of prayer and property in furūʿ al-fiqh.
Parsimonious word clouds revealed the prominent role played by certain, not always well-known authors and texts in their respective madhhabs, for example, Ibn Ḥamdān's al-Riʿāya in the Ḥanbalī school, or Qāḍīkhān's and al-Bazzāzī's Fatāwā in the Ḥanafī school. In the most experimental part of our study, topic modeling allowed us to see that, in diachronic perspective, questions of ritual law dominate the early texts in our corpus. In synchronic perspective, we found that ritual law occupies more space among Ḥanbalīs and Jaʿfarīs than in the other schools. Ḥanafīs emphasize public law and commerce, Mālikīs display a great interest in inheritance law, and Shāfiʿīs (or at least a certain group of Shāfiʿīs writing between the 10th/16th and the 12th/18th century) are much concerned with procedural law.
Beyond these findings, our primary aim in this article has been to introduce and to test the promise of a novel methodology, that is, the computational text mining of furūʿ al-fiqh. Reprising part 1 of this study, we conclude with four methodological reflections, and advance some suggestions for further research along computational lines. First, the results of studies such as the one presented here must be replicable, which means that there must be a sustainable and open-to-all environment in which relevant data are stored. Readers are encouraged to check our findings by referring to the metadata and text files of our corpus released on Zenodo, as well as to the codes developed to support our analysis, made available through Github; and then to run their own analyses on BlackLab adh and the Qurʾān Footprinter, both hosted by the Digital Humanities Lab at Utrecht University.
Second, the digital corpus of furūʿ al-fiqh deserves to be further curated and expanded. In the future, one important solution to the problem of bias in the corpus will be to grow the corpus in several directions: not one text per century, but several (focusing on those texts that were used most frequently, rather than those that happen to be digitally available), and not a mixture of genres, but full coverage of all genres. Likewise, texts from the Ibāḍī and other law schools should be included in future reiterations of this study. This, however, will have to wait until a greater number of texts, especially for the later, postclassical centuries, become available, a process that will require teamwork. No single scholar can carry out the laborious task of compiling such a corpus and preparing it for computational analysis.
Third, the text mining tools used in this study are far from exhaustive. A number of existing digital text-mining techniques are absent from our analysis. These include tools to detect text reuse that would enable us, for example, to study the hadith footprint in the corpus. The frequency-based analysis of concepts in Arabic texts or textual corpora, as this study has shown, must be based on complex, clustered searches, rather than on searches for simple text mining islamic law Islamic Law and Society 28 (2021) 234-281 words, stems or roots. Other techniques, such as topic modeling, are largely untested in idh. The present article is a first step to illustrate their usefulness.
Fourth, and finally, text mining the digital corpus in the full sense of the Digital Humanities requires manpower and time. Researchers in idh must be aware of this fact, ready to work in teams, and willing to do spadework. Collaboration in local teams should be complemented by international collaboration between research institutions and projects. Only then will idh-including the computational study of Islamic law, but also of other text genres-emerge from its current niche into full light, and move from experimental exploration to sustained analysis and output.