Browse results

You are looking at 1 - 10 of 92 items for :

  • Historical and Comparative Linguistics & Linguistic Typology x
  • Open accessible content x
  • Chapters/Articles x
  • Primary Language: English x
Clear All Modify Search

Johannes Dellert and Armin Buch

Abstract

Based on a recently published large-scale lexicostatistical database, we rank 1,016 concepts by their suitability for inclusion in Swadesh-style lists of basic stable concepts. For this, we define separate measures of basicness and stability. Basicness in the sense of morphological simplicity is measured based on information content, a generalization of word length which corrects for distorting effects of phoneme inventory sizes, phonotactics and non-stem morphemes in dictionary forms. Stability against replacement by semantic shift or borrowing is measured by sampling independent language pairs, and correlating the distances between the forms for the concept with the overall language distances. In order to determine the relative importance of basicness and stability, we optimize our combination of the two partial measures towards similarity with existing lists. A comparison with and among existing rankings suggests that concept rankings are highly data-dependent and therefore less well-grounded than previously assumed. To explore this issue, we evaluate the robustness of our ranking against language pair resampling, allowing us to assess how much volatility can be expected, and showing that only about half of the concepts on a list based on our ranking can safely be assumed to belong on the list independently of the data.

Gerhard Jäger and Johann-Mattis List

Abstract

Current efforts in computational historical linguistics are predominantly concerned with phylogenetic inference. Methods for ancestral state reconstruction have only been applied sporadically. In contrast to phylogenetic algorithms, automatic reconstruction methods presuppose phylogenetic information in order to explain what has evolved when and where. Here we report a pilot study exploring how well automatic methods for ancestral state reconstruction perform in the task of onomasiological reconstruction in multilingual word lists, where algorithms are used to infer how the words evolved along a given phylogeny, and reconstruct which cognate classes were used to express a given meaning in the ancestral languages. Comparing three different methods, Maximum Parsimony, Minimal Lateral Networks, and Maximum Likelihood on three different test sets (Indo-European, Austronesian, Chinese) using binary and multi-state coding of the data as well as single and sampled phylogenies, we find that Maximum Likelihood largely outperforms the other methods. At the same time, however, the general performance was disappointingly low, ranging between 0.66 (Chinese) and 0.79 (Austronesian) for the F-Scores. A closer linguistic evaluation of the reconstructions proposed by the best method and the reconstructions given in the gold standards revealed that the majority of the cases where the algorithms failed can be attributed to problems of independent semantic shift (homoplasy), to morphological processes in lexical change, and to wrong reconstructions in the independently created test sets that we employed.