An Indological transcription of Middle Chinese

Because most Sino-Tibetan languages with a literary tradition use Indic derived scripts and those that do not are each sui generis , there are advantages to transcribing these languages also along Indic lines. In particular, this article proposes an Indological transcription for Middle Chinese.


Introduction
The great majority of Sino-Tibetan languages with a literary tradition employ scripts that ultimately derive from a Brahmi model.Examples include Pyu (c.5th-13th cent.ce), Tibetan (from 650 ce), Burmese (from 1113ce), Newar (from 1114 ce), Lepcha (17th cent.ce), and Limbu (18th cent.ce).In addition, living Sino-Tibetan languages of Nepal are typically written in Devanagari.1 The ubiquity of the International Alphabet of Sanskrit Transliteration (iast) within Indology and related disciplines makes obvious the choice of an Indological transcription for these various scripts.Those Sino-Tibetan languages that use non-Indic derived scripts include Chinese (from 1250 bce), Tangut (1038-1502 ce), Yi (from 1485 ce), Naxi (19th cent.ce?), and possibly Meitei (16th.cent.ce?).The scripts of this latter group are not obviously related to each other; to Cahiers de Linguistique Asie Orientale 52 (2023) 40-50 adapt a transcription from one to another would not be easy.As a discipline we thus face the choice of either (a) using Indological principles to construct fundamentally mutually compatible transcription practices across all literary Sino-Tibetan languages or (b) embracing outright eclecticism.
Examples from Hill 2019 make clear the infelicity of mixing transliteration systems.In this book one finds both many-to-one mappings and one-to-many mappings between symbols and their phonetic interpretation.
Tib. འཇོལ་ ḫǰol 'hang down' , Chi. 垂 dzywe < *[d]oj (19-17a) 'hang down' ibid., 36 Here the letter ⟨ǰ⟩ in Tibetan and the series of letters ⟨dzy⟩ in Middle Chinese both indicate a voiced palatal affricate.Although the reader will not be paralyzed by confusion in face of this inconsistency, issues such as this again and again impose small hurdles to comprehension.Tib.འགྲོད་ ḫgrod < *gʷrat 'go, walk' , Chi. 越 hjwot < *ɢʷat (22-05e) 'pass over' ibid., 20 Here the sound [ɣ] is written ⟨ḫ⟩ in Tibetan and ⟨h⟩ in Middle Chinese.On the other hand, the letter ⟨o⟩ in Tibetan transcription means [o], but in Middle Chinese it means [ʌ].Such many-to-one and one-to-many mappings give the reader a lot to keep track of.
There are good reasons to use Indological conventions even in the transcription of those scripts without Indic origins.First, using Indological transcription across all literary Sino-Tibetan languages would lower the transaction costs of both teaching and learning, since the principles learned in the transcription of one script would apply mutatis mutandis to all others.This benefit is especially salient inside of a single piece of research and in specifically Sino-Tibetan (as opposed to say Sinological) research.Of course, whatever our current or future practice may be, students as they pursue their studies will inevitably confront Karlgren's, Li's, and Baxter's systems for Middle Chinese, Nishida's, Gong Hwangcherng's, and Arakawa's for Tangut, Jaeschke's and Wylie's for Tibetan, etc.A transliteration scheme to rule them all, as once envisioned by William Jones, is a fruitless and unachievable goal.Nonetheless, the application of analogous principles in the transcription of literary Sino-Tibetan languages would make it easier for students to set off down this arduous road.Second, the Indic linguistic tradition had a substantial influence on Chinese and Tangut indigenous phonological works.Thus, the analysis of Chinese and Tangut syllables in Indological terms is quite straightforward and remains respectful to the Chi- nese tradition.Third, many of the sources relevant for the phonetic interpretation of these materials are themselves in Indic inspired scripts.In the Han period we have loans from Indo-Aryan languages in Buddhist texts (Coblin 1983); in the Tang period and subsequently there are transcription of Chinese and Tangut syllables into the Tibetan script (Takata 1988, Dai 2008); in the Yuan there are transcriptions of Chinese syllables into 'Phags-pa script, found in the Menggu Ziyun 蒙 古 字 韻.Thus, even in strictly Sinological works the occasion will arise when Middle Chinese needs to be presented on the same page with evidence that by its very nature is amenable to Indological transcription.Fourth, in the same way that important works in Indo-European linguistics appear in German, French, Italian, and Russian, we may hope that in the future important Sino-Tibetan research will appear written in Tibetan, Burmese, and Newar.Since the orthographic systems of these languages already possess conventions for writing Sanskrit, Indological transcriptions of Chinese, Tangut, etc. in Roman letters are easily adopted into Tibetan, Burmese, or Newar.For example 廛 ḍien could be written ཌྱེ ན་ with the Tibetan script.
The fast pace of research in Tangut phonology (Gong 2020, Gong 2022) recommends against hastily parting from the system of Gong Hwangcherng, but the categories of Middle Chinese and their overall phonetic interpretation is not in flux.In particular, Baxter (1992) proposed a transcription system that exactly encodes the categories of the rhyme books and rhyme tables in a straightforward way.The purpose of this essay is to bring Baxter's transcription system into line with Indological principles, and to rectify those few places where his choices are misleading.

Disadvantages of ipa-based transcription practices
One might accept that all literary Sino-Tibetan languages should be transcribed in kindred ways but yet not favor an Indological approach.If Sino-Tibetan historical linguistics is more affiliated with other domains of linguistics than with other areas of oriental studies, transcription based on the International Phonetic Alphabet (ipa) may in particular recommend itself.Guillaume Jacques (2012) writes that since "the pronunciation of Old Tibetan is relatively better known in comparison to that of many other old languages … it seems more sensible to represent the Tibetan letters by their ipa equivalents" (ibid., 89).2 Cahiers de Linguistique Asie Orientale 52 (2023) 40-50 However, no matter how well understood a language's phonology is, an ipa transcription is not sufficiently agnostic about precise phonetics.For example, Jacques transcribes ཤ as ⟨ɕ⟩ whereas Shen (2020, 235) writes the same letter as ⟨ʃ⟩.To my knowledge no existing research treats the question of whether ཤ was [ɕ] or [ʃ] in Old Tibetan.If one transcribes ཤ as ⟨ś⟩ or ⟨š⟩ or the like, it is clear enough that this transcription is a mechanical replacement of a written symbol in one system with a symbol form another.If one instead writes ⟨ɕ⟩, the act of transcription is obscured, instead becoming the claim that any word written with ཤ in fact was pronounced with the segment [ɕ].3Despite his claim that the phonology of Old Tibetan is sufficiently well understood to represent Tibetan letters with ipa symbols, Jacques himself admits that the 23rd letter is controversial (2012,(91)(92).Following Coblin (2002), he thinks this letter has different phonetic interpretations in each of the phonotactic positions in which it appears; Hill rejects this understanding (2005, 2009, 2019, 5 n. 4).
Prematurely prejudicing the solution to ongoing controversies points to a more deep-seated failing of the ipa when applied to transcription."One of the most obvious rules of Romanization is that Romanized sequences of letters should contain no more and no less information than the original text" (Balk and Janhunen 1999, 21).The major merit of Baxter's 1992 transcription is exactly that it is not a reconstruction of how Middle Chinese was pronounced.
The notation I introduce here is not intended as a reconstruction; rather it is a convenient transcription which adequately represents all the phonological distinctions of Middle Chinese while leaving controversial questions open.
baxter 1992, 27 3 An anonymous referee objects that the conventions of writing ⟨a⟩ for writing, /a/ for phonemes, and [a] for phones is perfectly satisfactory for making clear whether one intends phonetic precision.One could in principle transcribe ཤ as ⟨ɕ⟩, and still claim that it is phonologically /ç/ and phonetically [ʃ].The referee is quite right in principle and by implication is well satisfied with Baxter's ⟨o⟩ for ʌ and ⟨h⟩ for ɣ and with Hill's ⟨ǰ⟩ and ⟨dzy⟩ for the same /dʒ/.Nonetheless, in practice neither students nor seasoned researchers will perceive an ipa symbol as a purely conventional representation of a philological artefact, even if explicitly told to do so.Human beings are creatures of habit.In principle one could transcribe ཤ as ⟨k⟩ or ⟨¥⟩ or whatever you like.The classroom is the crucible to assay these principles.Thus, even if it were known with absolute certainty that the 23rd letter of the Tibetan alphabet in Old Tibetan represented two distinct phonemes, it would still be illegitimate to Romanize the same Tibetan letter in two ways.The ipa cannot transcribe the available philological information without offering a phonological reconstruction.
Mongolian offers a lesson in the pitfalls of conflating Romanization and phonetic reconstruction.The Mongolian script massively underspecifies the phonemes of Middle Mongolian, for example using the same letter for [a], [e], and [n]; traditionally in Romanization these three phonemes are distinguished.
Needless to say, this approach allows no distinction to be made between the graphic information contained in the written message and the corresponding phonemic sequences, which the writing only imperfectly reflects.
balk and janhunen 1999, 18 A single Romanization for the Greek script will serve well from the Archaic period right through the Ottoman, because the Greek writing system did not fundamentally change across these eras, but an ipa-based transcription of Greek would be comically misleading already by the Alexandrian period.A transcription should not aim to reflect the concrete pronunciation of a language at any period but should reflect the sonus grammae (Yabu 2014) of the writing system qua system.In sum, ipa-based Romanizations are methodologically inadequate; they obscure the primary philological data by tainting it with the phonemic analysis that must be the output and not the input of our lucubrations.

Concrete proposals for an Indological transcription of Middle Chinese
To present concrete proposals for an Indological transcription of Middle Chinese, it is convenient to survey the components of the Middle Chinese syllable: initials ( §3.1), medials ( §3.2), vowels ( §3.3), codas ( §3.4), and tones ( §3.5).

Initials
For the velars ( yá 牙), labials (chún 唇), dentals (shétóu 舌頭), and dental sibilants (chǐyīn 齒 音) Baxter's system and an Indological system would in any case be the same; Baxter's ⟨y⟩ and ⟨l⟩ also remain as they are.For the palatals (zhāngzǔ 章組) and retroflex stops (shéshǎng 舌上) the Indological equivalent is obvious, viz.⟨c⟩ etc. in place of ⟨tsy⟩ etc. and ⟨ṭ⟩ etc. in place of ⟨tr⟩ etc.4 For the retroflex sibilants (zhuāngzǔ 莊組), one might be tempted either to write ⟨c̣ ⟩ or ⟨tṣ⟩.The second option has a number of advantages.First, there is no typographically feasible way of putting a dot under a ⟨j⟩ to stand as equivalent to ⟨dẓ⟩.Second, the letter ⟨c̣ ⟩ does not occur in Indological transcription, so its interpretation is less obvious than that of ⟨tṣ⟩, which consists of two letters that do appear in Indological transcription.5A third point against ⟨c̣ ⟩, albeit a small one, is that there is not a Unicode code point for this character.
The Middle Chinese laryngeals (喉 hóu) require somewhat more comment.Here we confront 影 yǐng [ʔ-], 曉 xiǎo [x-], and 匣 xiá [ɣ-].Indological conventions do not offer a solution for transcribing the glottal stop; I propose ⟨・⟩.6In Sanskrit ⟨h⟩ represents a voiced glottal fricative and ⟨ḥ⟩, the visarga, its voiceless counterpart.One might therefore write ⟨ḥ⟩ for 曉 xiǎo [x-] and ⟨h⟩ for 匣 xiá [ɣ-], but there are good reasons for not doing so.First, Tibetan ཧ and Burmese ဟ, the structural equivalents of Indic ⟨h⟩, represent the voiceless glottal fricative [h].Second, in some systems of Tibetan transliteration ⟨ḥ⟩ represents the infamous 23rd letter འ , which represents a voiced fricative, and in Burmese the visarga ◌း is a marker of the high tone.Thus, neither ⟨h⟩ nor ⟨ḥ⟩ have their Sanskrit meaning in the major written languages of the Sino-Tibetan family.Since the Tibetans and the Burmans decided to associate Indic ⟨h⟩ with their voiceless glottal, it is best to defer to their choice.Similarly, the letter ⟨ḥ⟩ should be left for representing those phenomena that, in particular writing systems, have a structural or graphic tie with the Indic visarga.Following Li (1974Li ( -1975, 226) , 226) and the ipa, one could write 曉 xiǎo as ⟨x⟩ and 匣 xiá as ⟨γ⟩, since ⟨x⟩ is a standard Roman character and ⟨γ⟩, although Greek, figures in other Romanization systems, such as the standard system for Mongolian.Still, in order to keep our Romanization Roman and to avoid the frequent association of ⟨x⟩ with [ks], I do not find ⟨x⟩ and ⟨γ⟩ good choices.For lack of a better solution and in keeping with Hill's (2019)

3.2.1
Treatment of division-iii syllables The divisions of Middle Chinese are too complex to introduce here (Baxter 1992, 42-43, Hill 2019, 95-99).It suffices to to say that division-iii (type B) and non-division-iii (type A) is a major cleavage in the phonological system of both Old and Middle Chinese.It is traditional to associate division-iii with a medial -y-[j], although the exact phonetics of the distinction is controversial in both periods.Baxter writes division-iii with -j-, this is clearly not a good option for us, since ⟨j⟩ we already use for the voiced palatal affricate (Baxter's dzy-).I believe that -i-is a good choice for indexing division-iii.One advantage is that with rounded syllables we get easy to read things like 誑 kiwanH (Baxter's kjwanH).Also, we get very Chinese looking things like 是 jieX (Baxter's dzyeX).A risk of using -i-to mark division-iii is that it would be ugly and confusing before the vowel -i-, since this would yield a double -ii-, in a word like 稹 ciin.However, in the same way Baxter writes 稹 tsyin and not 稹 tsyjin we can write 稹 cin instead of ciin, since initial c-itself already indexes division-iii.7

3.2.2
Treatment of 合口 hékŏu syllables The Song dynasty rhyme tables allow for the identification of rounded (合口 hékŏu) versus unrounded (開 口 kāikŏu) syllables (Baxter 1992, 62, Hill 2019, 99-100, §84).I follow Baxter in writing rounded syllables with -w-.Indologically speaking -v-would be the better choice.Nonetheless, since -w-is typically used in the transcription of Tibetan and Burmese, this letter perhaps has more to speak in its favor.

3.2.3
The 重紐 chóngniǔ problem.Eight rimes of the Qièyùn (viz.zhī 支, zhī 脂, zhài 祭, xiāo 宵, qīn 侵, yán 鹽, zhēn 真, and xiān 仙) contain a pair of homophone groups that have incommensurate chains of fǎnqiè rime spellers and cannot be distinguished on the basis of hékŏu versus kāikŏu (Baxter 1977, 56, 60-64).Looking at the treatment of pairs of chóngniǔ homophone groups in the rime tables, the one homophone group is put into rank-iii and the other group in rank-iv.As a matter of terminology characters of a relevant Qièyùn homophone group that is put in rank-iii are called 'chóngniǔ rank-iii' characters (重紐三等 chóngniǔ sānděng) and characters of the other Qièyùn homophone group, the one put in rank-iv, are called 'chóngniǔ rank-iv' characters (重紐四等 chóngniǔ sìděng).In Baxter's transcription, chóngniǔ rank-iv are marked with an additional i or j.For instance, the chóngniǔ rank-iii word 碑 he transcribes as pje while the chóngniǔ rank-iv word 卑 he transcribes as pjie.I propose to write these respectively as pie and pyie.A merit of this solution is that medial -y-in the proposed system immediately and uniquely indexes chóngniǔ rank-iv syllables.hill Cahiers de Linguistique Asie Orientale 52 (2023) 40-50

3.3
Vowels The vowels -ae-, -ea-and -ɨ-appear in Baxter's system, but not in the iast.8There is no reason to write them any differently in an Indological transcription.In contrast, Baxter's use of -o-to represent [ʌ] causes much confusion with students; a better solution should be sought.I do not find ⟨ʌ⟩ itself is a good solution.Although one is presumably meant to see here an ⟨A⟩ without the crossbar, in my experience students do not recognize here a vowel symbol at all, but instead the 'wedge' ⟨∧⟩ of mathematics.A better option is the 'schwa' ⟨ə⟩, which is known even to those who know no other ipa character; it is quite obviously a vowel, and represents more or less the correct phonetic value.

3.4
Codas Baxter's system of finals can be adopted as is, with one exception, namely the letter ⟨y⟩ in place of the letter ⟨j⟩.

3.5
Tones Middle Chinese has four tones: level tone (平聲 píngshēng), the rising (上聲 shǎngshēng), departing (去聲 qùshēng) and entering (入聲 rùshēng) tones.In Baxter's system the capital letters -X and -H represent the 'rising' and 'departing' tones respectively.Both the 'level' and 'entering' tones are represented with no final capital letter, but a syllable in the 'entering' tone ends with a final stop whereas a syllable in the 'level' tone is either open or ends with a nasal.Baxter's excellent notation for the tones can be adopted as is in an Indological transcription.

Conclusion
Table 2 gives two samples of the Indological system proposed here, paired with Baxter's system for comparison.Most striking is how little difference there is.This fact itself is an advantage to the proposed Indological system.This system will benefit those who are unfamiliar with Baxter system without burdening those who are already used to his system.