The Coll Corpus: towards a corpus of web-based college student newspapers

In: New Frontiers of Corpus Research


Unlike major English-language corpora hitherto released, on-line college student newspapers provide an unexplored record from much younger writers. In these newspapers, 20-year-olds address their peers in a situation that largely parallels standard newspaper writing as regards formal correctness and time pressure. Nearly unconstrained by outside intervention or house style sheets, they deal with a range of university student interests, including creative writing. This preliminary version of the Coll Corpus consists of one issue each of nearly all 300-plus college and university newspapers available on the Web as of spring 1999, with a total of 3.88 million words. Although American English (AmE) dominates, the resultant geographical distribution is relatively well matched to actual population ratios. In its present form, the corpus already allows exploration of numerous lexical and semantic features along temporal and geographic dimensions. Given its on-line accessibility, future versions should be easily expandable by several orders of magnitude.

New Frontiers of Corpus Research

Papers from the Twenty First International Conference on English Language Research on Computerized Corpora Sydney 2000



All Time Past Year Past 30 Days
Abstract Views 39 12 2
Full Text Views 50 38 0
PDF Downloads 5 1 0