Purchase instant access (PDF download and unlimited online access):
By the early 1980s, corpus linguists were still considered maverick and were still pushing at the boundaries of language-processing technology, but a culture was slowly bootstrapping itself into place, as successive research results (e.g. Collins-Cobuild Dictionary) encouraged the sense that empirical data analysis was a sine qua non for linguists, and a terminology of corpus linguistics was emerging that allowed ideas to take form. This paper reviews the evolution of text corpora over the period 1980 to the present day, focussing on three milestones as a means of illustrating changing definitions of ‘corpus’ as well as some contemporary theoretical and methodological issues. The first milestone is the 20-million-word Birmingham Corpus (1980-1986), the second is the ‘dynamic’ corpus (1990-2004); the third is the ‘Web as corpus’ (1998-2004).