Using Phylogenetic Networks to Model Chinese Dialect History

in Language Dynamics and Change
Restricted Access
Get Access to Full Text
Rent on DeepDyve

Have an Access Token?

Enter your access token to activate and access content online.

Please login and go to your personal user account to enter your access token.


Have Institutional Access?

Access content through your institution. Any other coaching guidance?


The idea that language history is best visualized by a branching tree has been controversially discussed in the linguistic world and many alternative theories have been proposed. The reluctance of many scholars to accept the tree as the natural metaphor for language history was due to conflicting signals in linguistic data: many resemblances would simply not point to a unique tree. Despite these observations, the majority of automatic approaches applied to language data has been based on the tree model, while network approaches have rarely been applied. Due to the specific sociolinguistic situation in China, where very divergent varieties have been developing under the roof of a common culture and writing system, the history of the Chinese dialects is complex and intertwined. They are therefore a good test case for methods which no longer take the family tree as their primary model. Here we use a network approach to study the lexical history of 40 Chinese dialects. In contrast to previous approaches, our method is character-based and captures both vertical and horizontal aspects of language history. According to our results, the majority of characters in our data (about 54%) cannot be readily explained with the help of a given tree model. The borrowing events inferred by our method do not only reflect general uncertainties of Chinese dialect classification, they also reveal the strong influence of the standard language on Chinese dialect history.

Using Phylogenetic Networks to Model Chinese Dialect History

in Language Dynamics and Change



  • 4

    (2005) distinguishes different contexts in which the split of voiced to voiceless unaspirated and voiceless aspirated plosives occurred in order to distinguish Mǐn Cantonese and Mandarin.

  • 6

    We follow Mirkin et al. (2003) in counting the presence of a character in the root as a normal gain event.


  • View in gallery
    Figure 1

    Reference trees of the major groups for the Southern Chinese (a) and the Common Chinese (b) hypotheses. The reference trees are broadly based on the classifications of Norman (1988 and 2003) and Karlgren (1954), respectively, with the topologies expanded and adapted to accommodate the present sample (see text).

  • View in gallery
    Figure 2

    Comparing alternative gain-loss scenarios. White nodes indicate the presence of a character, black nodes its absence. Large nodes indicate the respective event (gain or loss). In a, no scenario is inferred, b assumes one gain and two loss events, and c assumes two gain events and no loss event.

  • View in gallery
    Figure 3

    Minimal lateral network reconstruction. If more than one origin is inferred for a given phyletic pattern, the nodes where the characters originate are connected by lateral edges (ac). In the mln (d), the edges inferred for all patterns are combined, with edge weights (visualized as differences in line width) reflecting the number of occurrences.

  • View in gallery
    Figure 4

    Removing redundant lateral edges in the minimal lateral network. a shows the initial stage. b shows the intermediate stage after edge weights have been inferred for all lateral edges. c shows the resulting minimum spanning tree.

  • View in gallery
    Figure 5

    The minimal lateral network of the Southern Chinese reference tree. The node size reflects the inferred number of cognate sets in each language variety. The links reflect the minimal number of lateral transfer events that is required to minimize the differences between the ancestral and the contemporary vocabulary size distribution.

  • View in gallery
    Figure 6

    The minimal spatial network of the Southern Chinese reference tree. The links reflect the external and the internal edges between all contemporary language varieties as inferred in the minimal lateral network.

Index Card

Content Metrics

Content Metrics

All Time Past Year Past 30 Days
Abstract Views 254 248 11
Full Text Views 155 155 0
PDF Downloads 18 18 0
EPUB Downloads 4 4 0