Caesurae, Bridges, and the Colometry of Four Tocharian B Meters

The Tocharians composed verse in hierarchical structures, with the verse dominating major cola, and the major colon in turn dominating one or more minor cola. After providing much-needed descriptive data on Tocharian meter, we assess the evidence for the distinction between major vs. minor caesurae in some of the most popular Tocharian B meters, finding support for the commonly assumed colometries in some but not all cases. Of particular interest is the recurring 4+3-syllable colon, since the violability of its internal (putatively minor) caesura varies significantly across meters. We argue that this varying strictness is indeed a function of the meter as opposed to position in the verse, verse length, idiosyncrasies of certain texts, and so forth. We then use a systematic prose comparison method to test the meters for bridges, finding evidence for monosyllable avoidance in (certain) colon-final positions, despite an overall preference for monosyllables in verse vis-à-vis prose. Finally, we discuss the implications that our study has for the restoration of fragmentary Tocharian texts.

2 Evidence for a distinction between major and minor cola The assumption of a specifically hierarchical distinction between major and minor cola in Tocharian is intuitive for 4x14 due to the rhythmic symmetry of the verse, i.e. the repetition of 4+3syllable sequences. In verses without this symmetry, e.g. 4x12 (5+4+3), 4x15 (4+3+3+5), and 4x18 (4+3+4+3+4), the recurrence of 4+3 across the meters, especially verse-peripherally, is suggestive of its metrical coherence. Positing hierarchical structure within the verse permits insightful analyses of various metrical phenomena (cf. Prince 1989;Hayes and MacEachern 1998;Kiparsky 2006;Hayes 2010Hayes : 2515Hayes -2516. For example, the analysis of the iambic metron of Ancient Greek tragic trimeter given in Figure 2, which involves both hierarchical structure and a strong vs. weak distinction, allows for an intuitive explanation of the asymmetry between the anceps position (x) of the weak foot and the brevis (⏑) of the strong one. Any asymmetry in the way that the Tocharian B poets composed cola can thus in principle reflect a difference in their metrical status. Whether that difference involves hierarchical structure, binary distinctions such as strong vs. weak, or something else, is a different matter, which we return to in Section 4. Crucially, however, it must be shown that such an asymmetry should be attributed to the meter per se rather than to some other aspect of the language, e.g. the syntax.
2.1 Distribution of clitics Winter (1959) identifies such an asymmetry in the distribution of the sentential clitics ra, ka, ṣpä, and no in 7-syllable cola in several Tocharian B meters. Positing the sort of structure given for 7syllable cola in Figure 1, he points out that the clitics occur at the end of the minor colon (σ) ca. twice as often as they do at the end of the major colon (σ). 4 In a more thorough examination of the phenomenon, Malzahn (2012b) argues convincingly that the distribution should be attributed in the main to an aspect of Tocharian B syntax, specifically to the localization of second position clitics ("Wackernagel's Law"). Of the 205 sentential clitics in her verse corpus, 93% follow the first "orthotonic" word in their syntactic clause, just as they do in prose. Malzahn's study exemplifies the need to rule out potential confounds from syntax and other areas of the grammar when studying meter. Since poetic and prose texts in Tocharian B are roughly contemporary and compatible in genre, prose provides an excellent baseline for comparison. Any systematic differences between verse and prose can then be ascribed to the meter or to other characteristics proper to versification. 5

Caesura violability
Von Gabain and Winter (1958: 33-34), the first to propose a distinction between major and minor cola, characterize the minor caesurae as more readily violable than major ones: Wir dürften damit berechtigt sein, neben festen Hauptzäsuren auch Nebenzäsuren anzunehmen, d. h. fakultativ aufhebbare Grenzen zwischen Unterabschnitten innerhalb der Kolen. 6 (von Gabain and Winter 1958: 34) This characterization is repeated in Winter (1959) and later work, but has not yet been systematically studied. Accepting Malzahn's analysis of clitic distribution in verse, caesura violability is to our knowledge the only remaining diagnostic that has been proposed for the distinction between major and minor cola in Tocharian.

Violability in four meters
In this section, we investigate the violability of the caesurae in 4x12, 4x15, 4x14, and 4x18. We compare the violability of putative major caesurae with putative minor caesurae, and find that the caesurae in 4+3-syllable sequences differ significantly from other caesurae in the same meter. This can be taken as evidence for their minor status and supports the standard colometries found in recent descriptions of Tocharian meter. Against the standard colometries, however, caesura violability provides no evidence for the minor status of the caesura after the 10 th syllable in 4x15.
The following generalization holds for the meters studied here: from the standpoint of violability, the caesurae in 4+3-syllable sequences are minor caesurae; the others are major caesurae.

Caesura violations in 4x12 (5+4+3)
The colometry of 4x12 (5+4+3) is standardly given as [5||7], with two major cola. 7 Given the standard analysis of 4+3 sequences ("7ers"), this implies [5||4|3]. If the poets' preference for respecting the major caesura is stronger than their preference for respecting the minor one, we would expect them to violate the caesura after the 9 th syllable significantly more frequently than the one after the 5 th . Figure 3 plots the incidence of word boundary in 4x12, based on a corpus of 317 verses. 8 Since the manner in which the poets realize caesurae suggests that they treated sequences of a lexical word followed by a monosyllabic clitic as a single word, we did so as well in both the verse and prose corpora. 9 Note that the numbers along the x-axis of the plot represent verse-internal word boundaries. The peaks at 5 and 9 reflect the caesurae after the 5 th and 9 th syllables. In support of the major vs. minor distinction, there is a boundary after 5 in 99.3% of the verses, and after 9 in only 87.0%. The error bars give a sense of which differences are significant; the absence of any overlap between the error bars at 5 and 9 in the plot suggests that the difference is significant. We can confirm the statistical significance of the difference with Fisher's Exact Test of Independence. We see from Table 1 that 38 of 547 caesurae are violated, and that 36 of those violations occur after 9. Assuming the null hypothesis that the poets treat the caesurae equally, Fisher's Exact Test tells us what the probability is that the violations would be at least this unevenly distributed. The probability (p) is less than .00001, meaning that a difference this great would have arisen by chance less than .001% of the time; we take p values less than .05 to be significant. Caesura violability thus supports the [5||4|3] colometry.
which the proportions in the figure are based vary from one position to the next (from 267 to 286, in this case); none is based on the full 317 verses. Here and elsewhere we adopt the restorations and emendations supplied in A Comprehensive Edition of Tocharian Manuscripts (CEToM). 9 We took the following clitics into account: ka, kca, ksa, ñke, tne, nai, no, nta, pi, ra, ram(t)  The opening stanza of the following passage of the Tocharian B version of the Karmavibhaṅga (PK AS 7B a4-a5) is fairly representative of the 4x12 corpus as a whole. The caesura after 5 is respected throughout, and the caesura after 9 is violated once, in verse 1b. Note that host-enclitic groups are joined with "+" here and throughout.
Having heard this, you will obtain the skills to do deeds and you will not err in many ways."
May I not bear malice towards an evil person, even if he is malicious, let alone towards those who are good.
May hate and enmity not reside in me at all, even towards a murderous enemy, and may I abandon sin.
May I not meet with an ignorant one, and may I come together with good people."

Putative minor caesurae in the four meters
The poets treat the internal caesurae in 7ers significantly differently in at least three of the four meters. Figure 7 illustrates the variance in violability; the 7ers are arranged from most to least frequently violated and underlined.  This cannot be a function of verse length or of verse-initial vs. verse-final location of the colon, as is clearly demonstrated by the near inviolability of both minor caesurae in 4x14. Nor can it be attributed to the difficulty of metrifying relatively long words in meters with relatively short cola. That would make the opposite prediction. For example, although 5-syllable words can be metrified more comfortably in 4x12 (5+4+3) than in 4x14 (4+3+4+3), the latter exhibits a less violable 7erinternal caesura.
We also checked whether certain texts in our corpus were skewing the results. It would be thinkable, for instance, that 7ers in 4x12 are generally quite rigid except in a particular text or group of texts, which could in turn be due to genre, the practice of particular poets, etc. That turns out not to be true: the violations in 4x12 and 4x15 are quite consistent across texts (cf. Figure 8 and Figure 9). For post-4 violations in 4x15, the mean per text is 27.9%, the median 29.9%; for post-9 violations in 4x12, the mean is 9.2%, the median 7.9%. An unpaired t-test on these two vectors is significant (t(15) = -4.0, p = .001). Thus, taking texts rather than lines as units of analysis, 4x15 is consistently the most violable, followed by 4x12, followed by 4x18 and 4x14, where the minor caesurae in 7ers are nearly inviolable and thus extremely consistent across texts. The data invite us to consider the hypothesis that the violability of 7ers is a function of the flexibility of the meter as a whole. The most violable 7er-internal caesura is in the meter that also has the most violable major caesura, and the least violable ones are both in 4x14, which also has the strictest major caesura, perhaps encouraged by its symmetry. Furthermore, a sample of three stanzas from each of the meters suggests that enjambment is most frequent in 4x15, and least frequent in 4x18 and 4x14. The correlation is not perfect in either case, however, since 4x12 and 4x18 do not conform to the expected order.

Interim summary
To summarize, in Section 3 we showed that the caesurae in 4+3 sequences ("7ers") are significantly more violable than the other caesurae in the same meter, supporting their minor status. In Section 4, we demonstrated that the violability of the minor caesurae in 7ers is also significantly different across meters. In other words, the 7ers are different from each other. This variance in violability of 7er-internal caesurae cannot be attributed solely to a categorical major vs. minor distinction of the sort sketched in Section 2. It appears to be a function of the meter, and may not be a property that is confined to 7ers, but a property of the meter as a whole.
6 Word boundary incidence in verse and prose So far, we have examined the poets' treatment of caesurae using meter-to-meter comparisons. Prose-based comparison is useful for assessing possible bridges in the meters, which, if present, are subtler than caesurae in Tocharian B verse.

No Break after 9 in 4x12
PK AS 17 A-D, H-K, 16.2-3. The tests require us to identify intonational consituents (ICs) in the prose corpus. We assume that the following clause-and phrase-level syntactic constituents were mapped to Intonational Phrases (cf. Nespor and Vogel 2007;Selkirk 2011); scribal punctuation after these constituents was apparently optional, but provides some support for their reality.  The following passage (THT 88 a4-5) illustrates our identification of ICs in the prose corpus. dear see:SG.IPV-PTC impermanence thing:GEN.PL disappearance end "Having seen this, the tree-dwelling god says to his wife with sadness: 'Darling, look at the impermanence of things and their ultimate disappearance!'" We treated host-enclitic units in the prose corpus in the same way that we treated them in the verse corpora.

Average word length
An important difference between the prose corpus and the metrical corpus is the average length of words: 2.5 syllables in prose vs. 2.2 in verse. This is likely due in great part to the restrictions that colon size places on verse composition. In 4x14, for example, a word of 5 or more syllables cannot be localized anywhere in the verse without violating a caesura, and 4-syllable words can only be localized spanning positions 1-4 or 8-11. A simple way to quantify average colon size for the different metrical corpora is to divide the number of syllables per verse by the number of cola. This gives a kind of average ideal colon size, since it does not take differences in caesura violability into account. The relationship between colon size and verse length is given in The word length data is plotted in Figure 10. 16 It is clear that longer words are avoided in verse.
The higher percentage of shorter words in verse is at least in part an artifact of the underrepresentation of longer words, but the higher skew of monosyllables towards verse than dior trisyllables seems to reflect a favoring of monosyllables in particular, perhaps because they were useful to fill out cola or to fill particular (e.g. weak) metrical positions. We leave this topic for further investigation. The poets employ various tactics to avoid longer words in verse, several of which are exemplified in the stanzas cited above. These include choosing between lexical and morphological alternatives, e.g. the choice between ṣe and eṣe ʻ(together) withʼ in 4x18 and the use of nominative plural pelaikni in 4x14 for regular pelaiknenta ʻlawsʼ (cf. Peyrot 2008: 115-116), morphophonological deletion of underlying "weak" vowels, e.g. wertsyaine for wertsiyaine ʻin the assemblyʼ in 4x15 (cf. Winter 1990), and vowel sandhi, e.g. r-asānmeṃ for ra asānmeṃ ʻPTC throne:ABLʼ and śaiṣṣentsānaiwacci for śaiṣṣentse anaiwacci ʻworld:GEN unpleasant:NOM.PLʼ in 4x14 (cf. Stumpf 1971b). This is 16 No words of 6 or more syllables occur in the verse corpus; the three tokens that occur in the prose corpus, one each of 7, 9, and 10 syllables, are not plotted. Prose not to say that the poets simply shorten words wherever they can, of course. In syllable-counting meters, processes that affect syllable count -including augmentation processes such as the retention of an underlying word-final high vowel as "mobile" -o or -ä, e.g. ṣeko for ṣek in 4x15 and wṣi-ñä for wṣi-ñ in 4x18 (cf. Malzahn 2012a) -are more generally useful. In the aggregate, however, words are shorter in verse than they are in prose.

Bridges
Metrical bridges are positions within the verse where poets avoid word boundary. Generally speaking, in order to identify bridges, we want to compare the incidence of colon-internal word boundaries that we observe in verse with what we would expect if the poets were only concerned with respecting caesurae. We can model this expectation by using the syntactic/intonational constituents (ICs) from our prose corpus to construct pseudo-verse corpora with caesurae to match the actual verse corpora. In addition to matching caesura position and frequencies, we require the beginnings and endings of constructed verses to align with beginnings and endings of prose ICs, which mimics the poets' avoidance of enjambment. Drawing from prose ICs at random and respecting these constraints, we assembled a 100,000-verse corpus for each of the four meters.
A general pattern that emerges in the following four figures is that there are fewer word boundaries than expected in colon-penultimate position. In the plots, these are the points where the solid line dips below the broken one. This holds for all colon-penultimate positions in all four meters, but not for all verse-penultimate positions.  To test for the significance of the difference between the observed and expected boundary incidence in these positions, we employ a χ 2 Goodness of Fit Test. Table 5 gives the observed vs. expected boundary incidence after position 6 in 4x15, one of two potential bridge positions where the discrepancy is significant (p = .0008), even at a Bonferroni-corrected criterion. This indicates a difference between the poets' treatment of the second and third colon and could be taken to support the traditional colometry [4|3||3|5].

Boundary
No boundary Observed 21 (9.1%) 209 Expected 17,495 (17.5%) 82,505 However, neither the other significant bridge after 8 in 4x12 (p = .003) nor the borderline significant caesura after 3 in 4x14 (p = .015) is associated with a major caesura. 17 The motivation for the bridges remains unclear.

Implications for textual restoration
The relatively fragmentary state of the Tocharian corpus regularly requires editors to propose restorations. This can be done with a fair degree of accuracy, especially in cases where parallel texts in other languages exist, which often supply the approximate content of the lacunae. In addition to obvious restrictions such as the physical size of the lacuna in the manuscript, the meter constrains the number of possible restorations. The more exact understanding of the meters that we have arrived at requires that editors review restorations that have already been proposed and 17 The p-values for the other positions are as follows. 4x12: 4 (p = .10). 4x15: 3 (p = .07); 9 (p = .86). 4x14: 6 (p = .20); 10 (p = .21). 4x18: 3 (p = .45); 6 (p = .97); 10 (p = .51); 13 (p = .25). There are of course metrically aberrant verses in Tocharian poetry, which means that a metrically abnormal restoration is not impossible. The point of this section is that restorations should roughly follow the metrical practice of the poets as quantified and analyzed in this study.

Summary
Caesura violability provides evidence for the following colometries, with the caveat that in the relatively small 4x18 corpus, not every major caesura ("||") is significantly different from every minor caesura ("|"). An overview of this is provided in  From the standpoint of violability, all caesurae in 4+3 sequences are minor, and all other caesurae are major. Systematic comparison with prose texts reveals avoidance of colon-final (but not versefinal) monosyllables. It is unclear whether these bridge-like phenomena are metrical in nature or otherwise motivated.
The violability of the internal caesurae in the 4+3 sequences ("7ers") in 4x12 and 4x15 also varies significantly across meters. This cannot be explained by a categorical distinction between major and minor metrical constituents alone. It appears to be a function of the individual 7ers or of the individual meters.