From web page to mega-corpus: the CNN transcripts

in Corpus Linguistics and the Web
Restricted Access
Get Access to Full Text

Subject Highlights


This paper focuses on the technical and methodological issues involved in using data available on the internet as a basis for quantitative analyses of Present-day English. For this purpose, I concentrate on the creation of a specialized corpus of spoken data and outline the steps necessary to convert a large number of publicly available CNN transcripts into a format which is compatible with standard corpus tools. As an illustration of potential uses of such data, the second part of my paper then presents a sample analysis of the intensifier so. The paper concludes with a brief discussion of the advantages and limitations of this type of internet-derived data for corpus linguistic analysis.

Table of Contents




All Time Past Year Past 30 Days
Abstract Views 39 39 15
Full Text Views 7 7 3
PDF Downloads 4 4 3
EPUB Downloads 0 0 0

Related Content