From Data to Evidence in English Language Research draws on diverse digital data sources alongside more traditional linguistic corpora to offer new insights into the ways in which they can be used to extend and re-evaluate research questions in English linguistics. This is achieved, for example, by increasing data size, adding multi-layered contextual analyses, applying methods from adjacent fields, and adapting existing data sets to new uses. Making innovative contributions to digital linguistics, the chapters in the volume apply a combination of methods to the increasing amount of digital data available to researchers to show how this data – both established and newly available - can be utilized, enriched and rethought to provide new evidence for developments in the English language.
Carla Suhr, Ph.D., University of Helsinki, is a Senior Lecturer in English Philology at that university. She is a co-compiler of the Corpus of Early English Medical Writing and has published on corpus linguistics and historical pragmatics.
Terttu Nevalainen, Ph.D., University of Helsinki, is Professor of English Philology, the Director of the VARIENG Research Unit, and a co-compiler of the historical Helsinki Corpus and the Corpus of Early English Correspondence, with well over 100 related publications.
Irma Taavitsainen, Ph.D., University of Helsinki, Professor Emerita of English Philology, Deputy Director of VARIENG, and a co-compiler of the Helsinki Corpus and the Corpus of Early English Medical Writing, has published extensively on corpus linguistics and historical pragmatics.
Contributors are: Lieselotte Anderwald, Helen Baker, David Brett, Mark Davies, Stefania Degaetano-Ortlieb, Turo Hiltunen, Mark Kaunisto, Hanna Kermes, Ashraf Khamis, Thomas Kohnen, Mikko Laitinen, Alexander Lakaw, Daniela Landert, Magnus Levin, Tony McEnery, Terttu Nevalainen, Antonio Pinna, Antionette Renouf, Juhani Rudanko, Tanja Rütten, Gerold Schneider, Carla Suhr, Irma Taavitsainen, Elke Teich, Jukka Tyrkkö.
Table of contents
Preface Editors Notes on Contributors
Corpus Linguistics as Digital Scholarship: Big Data, Rich Data and Uncharted Data Terttu Nevalainen, Carla Suhr and Irma Taavitsainen
Part 1: Evidence from “Big Data”
Big Data: Opportunities and Challenges for English Corpus Linguistics Antoinette Renouf 3
Corpus-based Studies of Lexical and Semantic Variation: The Importance of Both Corpus Size and Corpus Design Mark Davies 4
Empirically Charting the Success of Prescriptivism: Some Case Studies of Nineteenth-century English Lieselotte Anderwald 5
Warn Against -ing: Exceptions to Bach’s Generalization in Four Varieties of English Mark Kaunisto and Juhani Rudanko
Part 2: Evidence from “Rich Data”?
Commonplace Books: Charting and Enriching Complex Data Thomas Kohnen 7
Mining Big Data: A Philologist’s Perspective Tanja Rütten 8
Function-to-form Mapping in Corpora: Historical Corpus Pragmatics and the Study of Stance Expressions Daniela Landert 9
Scholastic Argumentation in Early English Medical Writing and Its Afterlife: New Corpus Evidence Irma Taavitsainen and Gerold Schneider
Part 3: Evidence from Uncharted Data and Rethinking Old Data?
Language Surrounding Poverty in Early Modern England: A Corpus-based Investigation of How People Living in the Seventeenth-century Perceived the Criminalised Poor Tony McEnery and Helen Baker 11
An Information-Theoretic Approach to Modeling Diachronic Change in Scientific English Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis and Elke Teich 12
Academic Vocabulary in Wikipedia Articles: Frequency and Dispersion in Uneven Datasets Turo Hiltunen and Jukka Tyrkkö 13
Words (don’t come easy): The Automatic Retrieval and Analysis of Popular Song Lyrics David Brett and Antonio Pinna 14
Charting New Sources of
Data: A Multi-genre Corpus Approach Mikko Laitinen, Magnus Levin and Alexander Lakaw Indexe
Experts and learners interested in English corpus linguistics and related digital data sources, and in how their recent developments can be applied to old and new linguistic research questions.