Applying Population Genetic Approaches within Languages

Finnish Dialects as Linguistic Populations

in Language Dynamics and Change
Restricted Access
Get Access to Full Text
Rent on DeepDyve

Have an Access Token?



Enter your access token to activate and access content online.

Please login and go to your personal user account to enter your access token.



Help

Have Institutional Access?



Access content through your institution. Any other coaching guidance?



Connect

The adoption of evolutionary approaches to study language change as a type of non-biological evolution has gained increasing interest and introduced a variety of quantitative tools to linguistics. The focus has thus far mainly been on language families, or ‘linguistic macroevolution,’ and taken the shape of linguistic phylogenetics. Here we explore whether evolutionary methods could be applicable for studying intra-lingual variation (‘linguistic microevolution’) by testing a population genetic clustering method for analyzing the ‘population structure’ of Finnish dialects. We compare the results with traditional dialect divisions established in the literature and with K-medoids clustering, which is free from biological assumptions. The results are encouragingly similar to each other and agree with traditional views, suggesting that population genetic tools could be a useful addition to the dialectological toolkit. We also show how the results of the model-based clustering could serve as a basis for further study.

Applying Population Genetic Approaches within Languages

Finnish Dialects as Linguistic Populations

in Language Dynamics and Change

Sections

References

6

Most of these maps can be found in Wiik (2004).

Figures

  • View in gallery
    Figure 1

    The ‘gold standard’ of Finnish dialect divisions, suggested by Terho Itkonen (Itkonen, 1964). The main areas are: Southwest (1a–b), Southwest transitional (2a–e), Häme (3a–f), South Ostrobothnia (4), Middle/North Ostrobothnia (5a–b), Far North (6a–e), Savo (7a–h), and Southeast (8a–d). The primary division of these dialects is between western dialects (1–6) and eastern dialects (7–8).

  • View in gallery
    Figure 2

    An example page from the Dialect Atlas of Finnish (Kettunen, 1940a). The legend in the upper right lists the variants of the dialect feature that the map covers. The depicted page 8 documents morphophonological variation within the word metsä (‘forest’).

  • View in gallery
    Figure 3

    Two visualization styles for a division of Finnish dialects into 3 populations using Structure. Municipalities marked in white have not been studied. a) Traditional Structure barplot output. Each vertical line represents one of the studied 525 municipalities, and the color represents the dialect admixture proportions within that municipality (the frequencies of the three clusters). b) Frequency data plotted on a map, with frequencies of each inferred cluster (IC) divided to two classes: more saturated colors represent the core areas of the dialects, where the IC value is high (0.75–1); less saturated colors shows the transitional areas, with IC values between 0.5 and 0.75. c) Like b but with five frequency classes, showing the dialect transitions more accurately.

  • View in gallery
    Figure 4

    A close-up of South Ostrobothnia and the surrounding areas with K=8, using three visualizations: a) Dialects represented with two frequency classes. Municipalities in white along the coast represent areas without data; between dialects, they represent strong admixture—i.e., all IC values below 0.5. b) The same result shown as frequency bars, revealing the dialect admixture better. c) A small part of the map with percentages shown enlarged for better visibility.

  • View in gallery
    Figure 5

    a) Estimated mean log likelihood of the diploid data of K=1–20 (outliers excluded). b) ΔK of the same data with K=2–19.

  • View in gallery
    Figure 6

    Average silhouette widths with K=2–20

  • View in gallery
    Figure 7

    Dialect divisions K=2–8, with Structure diploid on the top row and K-medoids results on the bottom row. Structure diploid results use two shades of color to differentiate between core areas (more saturated colors, IC values 0.75–1) and transitional areas (less saturated colors, IC values 0.5–0.75). White municipalities in the peripheral areas are undocumented, whereas white municipalities in central areas indicate strong admixture (IC values under 0.50). The area shown separate from the rest of the map indicates Värmland in Sweden, where people from eastern Finland migrated in the 16th century. The colors for K=8 correspond with the following dialects: red = Southwest; purple = West Häme; brown = Southeast Häme + Päijät-Häme; orange = South Ostrobothnia; blue = Middle / North Ostrobothnia + North Kainuu + Kemijoki; olive green = Far North; green = Savo; gray = Southeast. A more detailed explanation of the areas is given in Section 4.2.1.

  • View in gallery
    Figure 8

    Dialect divisions K=9–14, with Structure diploid results presented in the top row and K-medoids results in the bottom row. The areas with the same color do not necessarily represent identical dialect areas across the maps. Other details are discussed for Figure 7. A more detailed explanation of the areas is given in Section 4.2.2.

  • View in gallery
    Figure 9

    Dialect divisions K=2–14 visualized with CLUMPP after excluding outliers. Color pairs for clusters below the maps are in the order of appearance to assist in observing the appearing clusters and their frequency.

  • View in gallery
    Figure 10

    Heat map and histogram for the municipality pair comparisons for each map sheet. The data points along the horizontal and vertical axes correspond to the map pages of the atlas. The color scale represents the level of linkage, with red (1.0) representing a high linkage percentage, and yellow a low linkage percentage (0.0).

  • View in gallery
  • View in gallery
    Figure 11

    Shannon-Wiener indices (SWI) calculated for each municipality after dividing the data into seven populations. SWI are divided into ten equal-sized classes: from the smallest SWI, indicating the lowest amount of linguistic diversity (municipalities colored with white), to the class of the largest SWI, indicating the largest amount of linguistic diversity (municipalities colored with black).

  • View in gallery
  • View in gallery
    Figure 12

    Core areas identified from a K=14 Structure run using an IC value threshold of 0.75

  • View in gallery
    Figure 13

    Pairwise F ST values, indicating linguistic differences of the populations presented in Fig. 12. The color codes in Fig. 12 match the ones in Fig. 13.

  • View in gallery
    Figure 14

    A linkage test heat map filtered by removing data points where the potential linkage (Lp) value was less than 25 % of the highest Lp value in the results. This illustrates one way of identifying more reliable estimates.

Index Card

Content Metrics

Content Metrics

All Time Past Year Past 30 Days
Abstract Views 118 116 16
Full Text Views 141 141 0
PDF Downloads 15 15 0
EPUB Downloads 0 0 0