Disputed authorship and text transfer are notorious problems in the textual transmission of Sanskrit, especially for large anonymous texts such as the Mahābhārata. Stratification methods for such texts have so far mainly relied on manuscriptology, higher textual criticism, and scattered historical evidence. This paper introduces a quantitative method for text stratification that uses frequent linguistic features for inducing authorial layers in Sanskrit texts. The proposed method is tested with texts whose authorial composition is known, and then applied to the Bhīṣmaparvan of the Mahābhārata.
Purchase
Buy instant access (PDF download and unlimited online access):
Institutional Login
Log in with Open Athens, Shibboleth, or your institutional credentials
Personal login
Log in with your brill.com account
Refer to Hacker (1961) and Oberlies (1997, 75–77) for a short survey.
One may refer to Koppel and Winter (2014) or Hellwig (forthcoming), who group text passages on the basis of word semantic information, or to Koppel et al. (2011) and Aldebei et al. (2015), who combine unsupervised and supervised methods.
Hellwig (2013) provides a focused study of duplicate detection in metrical Sanskrit texts.
tf-idf Spärck Jones (1972), which is a popular measure for processing bag-of-words vectors in Corpus Linguistics, is not used in this paper, because the inverse document term becomes zero, when a feature (term) occurs in all the texts of a document collection. While this paper focuses on high-frequency features that occur in all sections of the Mbh, the idf factor would remove most of the ubiquitous features by multiplying their term frequencies by 0.
Ponweiser (2012) inspects the harmonic means of the model likelihoods. LDA is repeated with increasing values of k for a set of documents, and the likelihoods of the resulting models are plotted against the values of k. The value of k for which the plot displays a turning point (the “elbow”; see Tibshirani et al. (2000)) is chosen as the maximal number of topics kmax for the given set of documents.
| All Time | Past 365 days | Past 30 Days | |
|---|---|---|---|
| Abstract Views | 550 | 109 | 2 |
| Full Text Views | 294 | 2 | 0 |
| PDF Views & Downloads | 87 | 4 | 0 |
Disputed authorship and text transfer are notorious problems in the textual transmission of Sanskrit, especially for large anonymous texts such as the Mahābhārata. Stratification methods for such texts have so far mainly relied on manuscriptology, higher textual criticism, and scattered historical evidence. This paper introduces a quantitative method for text stratification that uses frequent linguistic features for inducing authorial layers in Sanskrit texts. The proposed method is tested with texts whose authorial composition is known, and then applied to the Bhīṣmaparvan of the Mahābhārata.
| All Time | Past 365 days | Past 30 Days | |
|---|---|---|---|
| Abstract Views | 550 | 109 | 2 |
| Full Text Views | 294 | 2 | 0 |
| PDF Views & Downloads | 87 | 4 | 0 |