The evidence one can draw from the rhyming behavior of Old Chinese words plays a crucial role for the reconstruction of Old Chinese, and is particularly relevant to recent proposals. Some of these proposals are no longer solely based on the intuition of scholars but also substantiated by statistical arguments that help to assess the probability by which a given set of rhyming instances can be assigned to an established rhyme group. So far, however, quantitative methods were only used to confirm given hypotheses regarding rhyme groups in Old Chinese, and no exploratory analyses that would create hypotheses regarding rhyme groups in a corpus were carried out. This paper presents a new method that models rhyme data as weighted undirected networks. By representing rhyme words as nodes in a network and the frequency of rhymes in a given corpus as links between nodes, rhyme groups can be inferred with help of standard algorithms originally designed for social network analysis. This is illustrated through the construction of a rhyme network from the Book of Odes and comparing the automatically inferred rhyme groups with rhyme groups proposed in the literature. Apart from revealing interesting general properties of rhyme networks in Chinese historical phonology, the analysis provides strong evidence for a coda *-r in Old Chinese. The results of the analysis and the rhyme network of the Book of Odes can be inspected in form of an interactive online application or directly downloaded.
古代漢語的詞語所反映的韻為對上古音系的構擬，特別是對於最近的一些上古漢語構擬系統，異常重要。其中有一些構擬系統不再僅僅靠於學者的直覺，而且還用統計參數證實來評估分韻和派韻的概率。然而，迄今為止，定量方法僅用於確認關於上古韻部的假設，並且沒有進行探索性數據分析來創建初步分韻假設。本文提出了一種將韻母數據模型為加權無向網絡的新方法。此方法將韻母模型為網絡中的頂點，將某個語料庫的合韻率模型為聯頂點的邊緣，用社會網絡分析的標準算法來推斷語料庫所反映的韻母。為了更具體的說明此方法，本文用“詩經”來構建韻母網絡，而且比較自動與學者所推斷的上古韻部。除了揭示古代漢語韻網的一些有趣特點，“詩經”韻網分析了支持上古漢語韻尾* -r的新證據。“詩經”韻網和韻網分析的結果可以用交際在線應用來訪問而下載。(This article is in English.)
By reviewing a recent quantitative study of rhyme patterns in Mandarin Chinese, this study shows how data handling and data analysis in the study of rhyme patterns can be improved. Suggestions for improvement include (a) a consistent annotation of rhyme data, which is exhaustive and facilitates data reuse, and (b) emphasizes the importance of automated approaches for exploratory data analysis, which can help to analyze rhyme data in an improved way, prior to applying statistical frameworks for hypothesis testing.
This paper proposes the use of network techniques in the exploration of Old Chinese phonology as reflected in the phonophoric determinatives of xiéshēng 諧聲 characters. We use the approach to examine five specific proposals in Chinese historical phonology, and whether the distinctions suggested by these proposals can be said to be recoverable on the basis of phonophoric choice. The major finding is that the type A versus type B distinction is in some cases encoded in the choice of phonophoric determinative, while other distinctions are only spuriously if at all reflected in the phonophoric subseries.
The idea that language history is best visualized by a branching tree has been controversially discussed in the linguistic world and many alternative theories have been proposed. The reluctance of many scholars to accept the tree as the natural metaphor for language history was due to conflicting signals in linguistic data: many resemblances would simply not point to a unique tree. Despite these observations, the majority of automatic approaches applied to language data has been based on the tree model, while network approaches have rarely been applied. Due to the specific sociolinguistic situation in China, where very divergent varieties have been developing under the roof of a common culture and writing system, the history of the Chinese dialects is complex and intertwined. They are therefore a good test case for methods which no longer take the family tree as their primary model. Here we use a network approach to study the lexical history of 40 Chinese dialects. In contrast to previous approaches, our method is character-based and captures both vertical and horizontal aspects of language history. According to our results, the majority of characters in our data (about 54%) cannot be readily explained with the help of a given tree model. The borrowing events inferred by our method do not only reflect general uncertainties of Chinese dialect classification, they also reveal the strong influence of the standard language on Chinese dialect history.
Current efforts in computational historical linguistics are predominantly concerned with phylogenetic inference. Methods for ancestral state reconstruction have only been applied sporadically. In contrast to phylogenetic algorithms, automatic reconstruction methods presuppose phylogenetic information in order to explain what has evolved when and where. Here we report a pilot study exploring how well automatic methods for ancestral state reconstruction perform in the task of onomasiological reconstruction in multilingual word lists, where algorithms are used to infer how the words evolved along a given phylogeny, and reconstruct which cognate classes were used to express a given meaning in the ancestral languages. Comparing three different methods, Maximum Parsimony, Minimal Lateral Networks, and Maximum Likelihood on three different test sets (Indo-European, Austronesian, Chinese) using binary and multi-state coding of the data as well as single and sampled phylogenies, we find that Maximum Likelihood largely outperforms the other methods. At the same time, however, the general performance was disappointingly low, ranging between 0.66 (Chinese) and 0.79 (Austronesian) for the F-Scores. A closer linguistic evaluation of the reconstructions proposed by the best method and the reconstructions given in the gold standards revealed that the majority of the cases where the algorithms failed can be attributed to problems of independent semantic shift (homoplasy), to morphological processes in lexical change, and to wrong reconstructions in the independently created test sets that we employed.