Editor: Christian Mair
The complex politics of English as a world language provides the backdrop both for linguistic studies of varieties of English around the world and for postcolonial literary criticism. The present volume offers contributions from linguists and literary scholars that explore this common ground in a spirit of open interdisciplinary dialogue.
Leading authorities assess the state of the art to suggest directions for further research, with substantial case studies ranging over a wide variety of topics - from the legitimacy of language norms of lingua franca communication to the recognition of newer post-colonial varieties of English in the online OED. Four regional sections treat the Caribbean (including the diaspora), Africa, the Indian subcontinent, and Australasia and the Pacific Rim.
Each section maintains a careful balance between linguistics and literature, and external and indigenous perspectives on issues. The book is the most balanced, complete and up-to-date treatment of the topic to date.
Author: Christian Mair


The paper is a plea for closer cooperation between two traditions in corpus linguistics which have tended to develop in mutual isolation and, occasionally, in some hostility, namely (1) a “small-and-tidy” approach which emphasises detailed philological analysis of clean corpora, and (2) a “big-and-messy” one which stresses the advantages to be gained from the computer-assisted analysis of vast quantities of dirty data. Taking the familiar study example of the get-passive as a starting point, I argue that there are aspects of this well-studied and fairly common construction which cannot be investigated even in a very large closed corpus such as the BNC. Subsequently, I discuss cautionary procedures which need to be followed when mining for data on the Web. In spite of its obvious shortcomings as a corpus, the Web is an indispensable source of data for the study of infrequent and recent linguistic phenomena and, in addition, often provides high-quality data on badly documented “New Englishes”.

In: The Changing Face of Corpus Linguistics
Author: Christian Mair


The past two decades have seen considerable advances in the corpus-based “real-time” investigation of linguistic change in English, both in older stages of the language and in progress now. Inevitably, given our present resources, most claims about changes in the language as a whole have been based on written data. Against this backdrop, the present paper seeks to define the potential and limitations of the corpus-based “real-time” study of change in the spoken language, where even for a well documented language such as English the major problem is the paucity of corpus data.

In the absence of recordings of suitable quality, the study of real speech in real time will never be pushed back further than the early 20th century, but as I will make clear with the example of the WW I Phonographische Kommission recordings, a number of interesting resources may well deserve more corpuslinguistic attention than they have received so far. Considerable progress is also likely in the study of the history of the spoken language “by proxy”, i.e. through speech-based genres, of which vast amounts have recently been made available for corpus-linguistic study (Old Bailey, Literature Online, Google N-grams). Particularly with regard to grammar, though, more attention needs to be paid to the question of what is really speech-like in supposedly speech-based genres and which features of spoken syntax are likely to be edited out of the written rendering. Cleft constructions, present both in written and spoken English, but structurally and statistically more richly represented in the latter, will serve as illustration of this point.

In: English Corpus Linguistics: Variation in Time, Space and Genre
Author: Christian Mair


Working styles in corpus-linguistic research are changing fast. One traditional constellation, close(d) communities of researchers forming around a specific corpus or set of corpora (the “Brown / LOB community”, “the BNC community”), is becoming increasingly problematical – particularly in the study of ongoing linguistic change and recent and current usage. The present contribution argues that whenever the possibilities of closed corpora are exhausted, it is advisable to turn to the digitised texts which – at least for a language such as English – are supplied in practically unlimited quantity on the world wide web. Web material is most suitable for studies for which large quantities of text and/or very recent texts are required. Specialised chat-rooms and discussion forums may additionally provide an unexpected wealth of material on highly specific registers or varieties not previously documented in corpora to a sufficient extent. On the basis of selected study examples it will be shown that, contrary to widespread scepticism in the field, web texts are appropriate data for variationist studies of medium degrees of delicacy – provided that a few cautionary procedures are followed in the interpretation of the results.

In: Corpus Linguistics and the Web
Author: Christian Mair


The contribution opens with a general discussion of the relationship between sociolinguistics and corpus-linguistics. The point is made that while the concerns of these two traditions in the study of linguistic variability and variation were rather different at the outset they have meanwhile developed in such a way as to make co-operation fruitful and, indeed, necessary. This point is illustrated from the author’s own work on the recently completed Jamaican component of the International Corpus of English. The variables analysed are the use of person(s) as a synonym for people, the presence or absence of subject-verb inversion in questions, the modals of obligation and necessity, negative and auxiliary contraction and, finally, the use of the “new” quotative be like.

In: Corpus Linguistics
Author: Christian Mair


This paper takes as its theoretical framework an approach to corpus-aided discovery learning in which the central role of corpora is seen as that of providing rich sources of autonomous learning activities of a serendipitous kind. Here the suggestion is put forward that availability of different corpora and software tools and the ability to combine these in different ways depending on the purpose of the activity may help learners develop an understanding of the patterned quality of activity may help learners develop an understanding of the patterned quality of language (probability, strength of co-occurrence restrictions, levels of contextual appropriateness), and be conducive to more appropriate use, as learners are guided not just to observe patterns, but also to develop hypotheses as to their variability. A learning experience is described, in which learners are introduced to a number of corpus tools (larger and smaller, general and specific, monolingual and bilingual corpora; two different software programmes for corpus analysis), and guided to progress from more convergent activities to autonomous browsing. Positive and negative sides of the approach are discussed, also in the light of learners' comments, and suggestions for improving the methodology and the tools currently available to learners are put forward.

In: Teaching and Learning by Doing Corpus Analysis
Papers from the Twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20) Freiburg im Breisgau 1999
From being the occupation of a marginal (and frequently marginalised) group of researchers, the linguistic analysis of machine-readable language corpora has moved to the mainstream of research on the English language. In this process an impressive body of results has accumulated which, over and above the intrinsic descriptive interest it holds for students of the English language, forces a major and systematic re-thinking of foundational issues in linguistic theory. Corpus linguistics and linguistic theory was accordingly chosen as the motto for the twentieth annual gathering of ICAME, the International Computer Archive of Modern/ Medieval English, which was hosted by the University of Freiburg (Germany) in 1999. The present volume, which presents selected papers from this conference, thus builds on previous successful work in the computer-aided description of English and at the same time represents an attempt at stock-taking and methodological reflection in a linguistic subdiscipline that has clearly come of age.
Contributions cover all levels of linguistic description - from phonology/ prosody, through grammar and semantics to discourse-analytical issues such as genre or gender-specific linguistic usage. They are united by a desire to further the dialogue between the corpus-linguistic community and researchers working in other traditions. Thereby, the atmosphere ranges from undisguised skepticism (as expressed by Noam Chomsky in an interview which is part of the opening contribution by Bas Aarts) to empirically substantiated optimism (as, for example, in Bernadette Vine's significantly titled contribution Getting things done).
Studies in Digital Linguistics
The new-media revolution has led to a comprehensive digitization of our textual universe and the pervasive incorporation of the media into our everyday lives (from mobile telephony to social media). This calls for a concerted research effort uniting linguistics and other disciplines involved in language-related research. The massive growth in the amount, diversity and availability of textual and multimodal language data for many of the world’s languages poses several challenges. In terms of theory and methods, it forces us to rethink traditional notions of what linguistic corpora are and what role they play in linguistic description. Established corpus-linguistic methods such as concordancing and textual statistics are increasingly being complemented by visualization and geolocation of digital language data. Empirically, there is a growing need to document and analyse what people do with language in the increasingly technologized communicative ecology of the 21st century.

Language and Computers - Studies in Digital Linguistics invites contributions which
- explore innovative, intelligent and creative ways of using digital language data, resources and infrastructure for linguistic description
- contribute to the development and refinement of usage-based models in linguistics, using both quantitative and qualitative methods
- analyse all aspects of digitally mediated communication, from orthography to pragmatics and sociolinguistics