This paper runs counter to the majority of papers in this volume in focusing on the argument that, while welcoming opportunities to use new resources and methods, we should not neglect to improve and refine the resources and methods we already have.
The path of progress in corpus linguistics is strewn with unfinished business. Because no other realistic course is available, corpus linguists have understandably been following the path of practicality, pragmatism and opportunism. By and large, we have built up the resources and techniques of the present generation by taking advantage of what is already available and what can be relatively easily obtained. Our research efforts have consequently been limited and skewed by what resources we have been able to lay our hands on.
In this paper, I illustrate the skewing effect with reference to corpus design and composition, focusing on the desiderata of representativeness, ‘balancedness’ and comparability. After arguing that we need to give more consideration to these basic requirements, I briefly address the issue of representativity (a term used to mean ‘the degree to which a corpus is representative’) in relation to the use of the world-wide web as a source of corpus data, both with respect to ‘the web as corpus’ and with respect to ‘corpus building from www-material’.