The past two decades have seen considerable advances in the corpus-based “real-time” investigation of linguistic change in English, both in older stages of the language and in progress now. Inevitably, given our present resources, most claims about changes in the language as a whole have been based on written data. Against this backdrop, the present paper seeks to define the potential and limitations of the corpus-based “real-time” study of change in the spoken language, where even for a well documented language such as English the major problem is the paucity of corpus data.
In the absence of recordings of suitable quality, the study of real speech in real time will never be pushed back further than the early 20th century, but as I will make clear with the example of the WW I Phonographische Kommission recordings, a number of interesting resources may well deserve more corpuslinguistic attention than they have received so far. Considerable progress is also likely in the study of the history of the spoken language “by proxy”, i.e. through speech-based genres, of which vast amounts have recently been made available for corpus-linguistic study (Old Bailey, Literature Online, Google N-grams). Particularly with regard to grammar, though, more attention needs to be paid to the question of what is really speech-like in supposedly speech-based genres and which features of spoken syntax are likely to be edited out of the written rendering. Cleft constructions, present both in written and spoken English, but structurally and statistically more richly represented in the latter, will serve as illustration of this point.