Gaming artificial phylogenies

In: Language Dynamics and Change
View More View Less
  • 1 University of Cologne, Institute for Theoretical Physics, Köln, Germany
  • 2 SONY Computer Science Laboratories, Paris, France
  • 3 Sapienza University of Rome, Physics Deptartment, Rome, Italy
  • 4 Complexity Science Hub Vienna, Vienna, Austria

Purchase instant access (PDF download and unlimited online access):



The reconstruction of phylogenies of cultural artefacts represents an open problem that mixes theoretical and computational challenges. Existing benchmarks rely on simulated phylogenies, where hypotheses on the underlying evolutionary mechanisms are unavoidable, or on real data phylogenies, for which no true evolutionary history is known. Here we introduce a web-based game, Copystree, where users create phylogenies of manuscripts through successive copying actions in a fully monitored setup. While players enjoy the experience, Copystree allows to build artificial phylogenies whose evolutionary processes do not obey any predefined theoretical mechanisms, being generated instead with the unpredictability of human creativity. We present the analysis of the data gathered during the first set of experiments and use the artificial phylogenies gathered for a first test of existing phylogenetic algorithms.

  • Atkinson, Quentin D., Andrew Meade, Chris Venditti, Simon J. Greenhill, and Mark Pagel. 2008. Languages evolve in punctuational bursts. Science 319(5863): 588–588.

  • Bordalejo, Barbara. 2015. The genealogy of texts: Manuscript traditions and textual traditions. Digital Scholarship in the Humanities 31(3): 563–577.

  • Bryant, David, Flavia Filimon, and Russell D. Gray. 2005. Untangling our past: Languages, trees, splits and networks. In Ruth Mace, Clare J. Holden, and Stephen Shennan (eds.), The Evolution of Cultural Diversity: A Phylogenetic Approach, 67–83. Walnut Creek, CA: Left Coast Press.

  • Bryant, David, John Tsang, Paul E. Kearney, and Ming Li. 2000. Computing the quartet distance between evolutionary trees. In David Shmoys (ed.), Symposium on Discrete Algorithms: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, 285–286. Philadelphia, PA: Society for Industrial and Applied Mathematics.

  • Caetlidge, Neil. 2001. The Canterbury Tales and cladistics. Neuphilologische Mitteilungen 102(2): 135–150.

  • Canettieri, Paolo, Vittorio Loreto, Marta Rovetta, and Giovanna Santini. 2009. Philology and information theory. Cognitive Philology 1: 1. Downloadable at (accessed February 20, 2018).

  • Chris Christiansen, Thomas Mailund, Christian N.S. Pedersen, and Martin Randers. 2005. Computing the quartet distance between trees of arbitrary degree. In Rita Casadio and Gene Myers (eds.), Algorithms in Bioinformatics. 5th International Workshop, WABI 2005, Lecture Notes in Bioinformatics 3692, 77–88. Berlin: Springer.

  • Darwin, Charles R. 1859. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. London: John Murray.

  • Darwin, Charles. 1871. The Descent of Man, and Selection in Relation to Sex. London: John Murray.

  • Desper, Richard and Olivier Gascuel. 2002. Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. Journal of Computational Biology 9(5): 687–705.

  • Drummond, Alexei J. and Remco R. Bouckaert. 2015. Bayesian Evolutionary Analysis with Beast. Cambridge: Cambridge University Press.

  • Dunn, Michael, Simon J. Greenhill, Stephen C. Levinson, and Russell D. Gray. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473(7345): 79–82.

  • Dunn, Michael, Stephen C. Levinson, Eva Lindström, Ger Reesink, and Angela Terrill. 2008. Structural phylogeny in historical linguistics: Methodological explorations applied in Island Melanesia. Language 84(4): 710–759.

  • Felsenstein, Joseph. 2004. Inferring Phylogenies. Sunderland, MA: Sinauer Associates.

  • Gascuel, Olivier. 2005. Mathematics of Evolution and Phylogeny. Oxford: Oxford University Press.

  • Gray, Russell D. and Quentin D. Atkinson. 2003. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426(6965): 435–439.

  • Gray, Russell D., Alexej J. Drummond, and Simon J. Greenhill. 2009. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science 323(5913): 479–483.

  • Grenfell, Bryan T., Oliver G. Pybus, Julia R. Gog, James L.N. Wood, Janet Daly, Jenny A. Mumford, and Edward C. Holmes. 2004. Unifying the epidemiological and evolutionary dynamics of pathogens. Science 303(5656): 327–332.

  • Grier, James. 1989. Lachmann, bédier and the bipartite stemma: Towards a responsible application of the common-error method. Revue d’ histoire des textes 18(1988): 263–278.

  • Hanna, Ralph. 2000. The application of thought to textual criticism in all modes—with apologies to A.E. Housman. Studies in Bibliography 53: 163–172.

  • Holman, Eric W., Cecil H. Brown, Søren Wichmann, André Müller, Viveka Velupillai, Harald Hammarström, Sebastian Sauppe, Hagen Jung, Dik Bakker, Pamela Brown, and others. 2011. Automated dating of the world’s language families based on lexical similarity. Current Anthropology 52(6): 841–875.

  • Holman, Eric W. and Søren Wichmann. 2017. New evidence from linguistic phylogenetics supports phyletic gradualism. Systematic Biology 66.4: 604–610.

  • Holmes, Edward C. and Bryan T. Grenfell. 2009. Discovering the phylodynamics of RNA viruses. PLoS Computational Biology 5(10): e1000505.

  • Jäger, Gerhard. 2013. Phylogenetic inference from word lists using weighted alignment with empirically determined weights. Language Dynamics and Change 3(2): 245–291.

  • Jäger, Gerhard. 2014. Evaluating distance-based pyhlogenetic algorithms for automated language classification. Technical report, University of Tübingen. Downloadable at (accessed February 20, 2018).

  • Jäger, Gerhard. 2015. Support for linguistic macrofamilies from weighted sequence alignment. Proceedings of the National Academy of Sciences of the U.S.A. 112(41): 12752–12757.

  • Jones, Alex. 2001. The properties of a stemma: Relating the manuscripts in two texts from the Canterbury Tales. Parergon 18(2): 35–53.

  • Joseph, Brian D. and Richard D. Janda (eds.). 2004. The Handbook of Historical Linguistics. Malden, MA: Blackwell Publishing.

  • Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady 10: 707–710.

  • Likic, Vladimir. 2008. The Needleman-Wunsch algorithm for sequence alignment. Lecture given at the 7th Melbourne Bioinformatics Course of the Bi021 Molecular Science and Biotechnology Institute, University of Melbourne. Lecture notes downloadable at (accessed February 20, 2018).

  • Marmerola, Guilherme D., Marina A. Oikawa, Zanoni Dias, Siome Goldenstein, and Anderson Rocha. 2016. On the reconstruction of text phylogeny trees: Evaluation and analysis of textual relationships. PloS One 11(12): e0167822.

  • Maynard Smith, John and Eörs Szathmáry. 1997. The Major Transitions in Evolution. Oxford: Oxford University Press.

  • Moore, Edward. 1889. Contributions to the Textual Criticism of the Divina Commedia. Cambridge: Cambridge University Press.

  • O’Hara, Robert J. 1996. Trees of history in systematics and philology. Memorie della Società Italiana di Scienze Naturali e del Museo Civico di Storia Naturale di Milano 27: 81–88.

  • Pagel, Mark, Quentin D. Atkinson, and Andrew Meade. 2007. Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature 449(7163): 717–720.

  • Platnick, Norman I. and H. Don Cameron. 1977. Cladistic methods in textual, linguistic, and phylogenetic analysis. Systematic Biology 26(4): 380–385.

  • Pompei, Simone, Emanuele Caglioti, Vittorio Loreto, and Francesca Tria. 2010. Distance-based phylogenetic algorithms: New insights and applications. Mathematical Models and Methods in Applied Sciences 20(supp01): 1511–1532.

  • Pompei, Simone, Vittorio Loreto, and Francesca Tria. 2011. On the accuracy of language trees. PloS One 6(6): e20109.

  • Renfrew, Colin, April McMahon, and Robert Lawrence Trask. 2000. Time Depth in Historical Linguistics. Cambridge: The Macdonald Institute for Archaelogical Research.

  • Robinson, David F. and Leslie R. Foulds. 1981. Comparison of phylogenetic trees. Mathematical Biosciences 53(1–2): 131–147.

  • Saitou, Naruya and Masatoshi Nei. 1987. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4): 406–425.

  • Simonson, Anne B., Jacqueline A. Servin, Ryan G. Skophammer, Craig W. Herbold, Maria C. Rivera, and James A. Lake. 2005. Decoding the genomic tree of life. Proceedings of the National Academy of Sciences of the U.S.A. 102(suppl 1): 6608–6613.

  • Spencer, Matthew, Elizabeth A. Davidson, Adrian C. Barbrook, and Christopher J. Howe. 2004. Phylogenetics of artificial manuscripts. Journal of Theoretical Biology 227(4): 503–511.

  • Swadesh, Morris. 1952. Lexico-statistic dating of prehistoric ethnic contacts: With special reference to North American Indians and Eskimos. Proceedings of the American Philosophical Society 96(4): 452–463.

  • Swadesh, Morris. 1955. Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics 21(2): 121–137.

  • Timpanaro, Sebastiano. 1985. La genesi del metodo del lachmann (Vol. 5). Torino: Liviana.

  • Tonello, Elisabetta and Paolo Trovato. 2013. Nuove prospettive sulla tradizione della “commedia”: seconda serie (2008–2013). Limena:

  • Tria, Francesca, Emanuele Caglioti, Vittorio Loreto, and Andrea Pagnani. 2010a. A stochastic local search approach to language tree reconstruction. Diachronica 27(2): 341–358.

  • Tria, Francesca, Emanuele Caglioti, Vittorio Loreto, and Andrea Pagnani. 2010b. A stochastic local search algorithm for distance-based phylogeny reconstruction. Molecular Biology and Evolution 27(11): 2587–2595.

  • Tria, Francesca, Emanuele Caglioti, Vittorio Loreto, and Simone Pompei. 2010c. A fast noise reduction driven distance-based phylogenetic algorithm. In Hamid R. Arabnia, Quoc-Nam Tran, Rui Chang, Matthew He, Andy Marsh, Ashu M.G. Solo, and Jack Y. Yang (eds.), Proceedings of BIOCOMP 2010, 375–380. Athens, GA: CSREA Press.

  • Wichmann, Søren and Anthony P. Grant. 2012. Quantitative Approaches to Linguistic Diversity: Commemorating the Centenary of the Birth of Morris Swadesh. Amsterdam: John Benjamins Publishing.

Content Metrics

All Time Past Year Past 30 Days
Abstract Views 187 68 0
Full Text Views 98 1 0
PDF Views & Downloads 17 2 0