Search Results

Restricted Access

Series:

Andrew Kehoe

Abstract

The WebCorp project has demonstrated how the Web may be used as a source of linguistic data. One feature of standard corpus analysis tools hitherto missing in WebCorp is the ability to filter and sort results by date. This paper discusses the dating mechanisms available on the Web and the date query facilities offered by standard Web search engines. The new date heuristics built into WebCorp are then discussed and illustrated with a case study.

Restricted Access

Series:

Antoinette Renouf and Andrew Kehoe

Restricted Access

Series:

Andrew Kehoe and Matt Gee

Abstract

This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We introduce the diachronic search facilities provided by the WebCorp Linguist’s Search Engine, including the use of a new ‘heat map’ graph for the analysis of changes in collocational patterns over time. We illustrate how web data can be used to supplement data from standard corpora in lexicological studies. Our focus is on the vogue phrase credit crunch and the paper compares examples from standard corpora (BNC, Brown, LOB, Frown, LOB) with those found in web-accessible newspaper texts. Contrary to previous studies, we do not rely on the web solely for the most up-to-date usage examples. Instead, we show how web-accessible texts dating back to the beginning of the 20th Century can be used to fill gaps in and sharpen the picture provided by standard corpora.

Restricted Access

Series:

Edited by Antoinette Renouf and Andrew Kehoe

This volume is witness to a spirited and fruitful period in the evolution of corpus linguistics. In twenty-two articles written by established corpus linguists, members of the ICAME (International Computer Archive of Modern and Mediaeval English) association, this new volume brings the reader up to date with the cycle of activities which make up this field of study as it is today, dealing with corpus creation, language varieties, diachronic corpus study from the past to present, present-day synchronic corpus study, the web as corpus, and corpus linguistics and grammatical theory. It thus serves as a valuable guide to the state of the art for linguistic researchers, teachers and language learners of all persuasions.
After over twenty years of evolution, corpus linguistics has matured, incorporating nowadays not just small, medium and large primary corpus building but also specialised and multi-dimensional secondary corpus building; not just corpus analysis, but also corpus evaluation; not just an initial application of theory, but self-reflection and a new concern with theory in the light of experience.
The volume also highlights the growing emphasis on language as a changing phenomenon, both in terms of established historical study and the newer short-range diachronic study of 20th century and current English; and the growing area of overlap between these two.
Another section of the volume illustrates the recent changes in the definition of ‘corpus’ which have come about due to the emergence of new technologies and in particular of the availability of texts on the world wide web.
The volume culminates in the contributions by a group of corpus grammarians to a timely and novel discussion panel on the relationship between corpus linguistics and grammatical theory.
Restricted Access

Corpus Linguistics

Refinements and Reassessments

Series:

Edited by Antoinette Renouf and Andrew Kehoe

Throughout history, linguists and literary scholars have been impelled by curiosity about particular linguistic or literary phenomena to seek to observe them in action in original texts. The fruits of each earlier enquiry in turn nourish the desire to continue to acquire knowledge, through further observation of newer linguistic facts.
As time goes by, the corpus linguist operates increasingly in the awareness of what has gone before. Corpus Linguistics, thirty years on, is less an innocent sortie into corpus territory on the basis of a hunch than an informed, critical reassessment of existing analytical orthodoxy, in the light of new data coming on stream.
This volume comprises twenty-two articles penned by members of the ICAME (International Computer Archive of Modern and Mediaeval English) association, which together provide a critical and informed reappraisal of the facts, data, methods and tools of Corpus Linguistics which are available today. Authors reconsider the boundaries of the discipline, exploring its areas of commonality with Sociolinguistics, Language Variation, Discourse Linguistics, and Lexical Statistics and showing how that commonality is potentially of immense benefit to practitioners in the fields concerned.
The volume culminates in the report of a timely and novel expert panel discussion on the role of Corpus Linguistics in the study of English as a global language. This encompasses issues such as English as an international lingua franca, ‘norms’ for global English, and the question of ‘ownership’, or who qualifies as a native speaker.
Restricted Access

Series:

Antoinette Renouf, Andrew Kehoe and Jayeeta Banerjee

Abstract

The web has unique potential to yield large-volume data on up-to-date language use, obvious shortcomings notwithstanding. Since 1998, we have been developing a tool, WebCorp, to allow corpus linguists to retrieve raw and analysed linguistic output from the web. Based on internal trials and user feedback gleaned from our site (http://www. webcorp.org.uk/), we have established a working system which supports thousands of regular users world-wide. Many of the problems associated with the nature of web text have been accommodated, but problems remain, some due to the non-implementation of standards on the Internet, and others to reliance on commercial search engines, which mediation slows up average WebCorp response time and places constraints on linguistic search. To improve WebCorp performance, we are in the process of creating a tailored search engine, an infrastructure in which WebCorp will play an integral and enhanced role.

In this paper, we shall give a brief description of WebCorp, the nature and level of its current functionality, the linguistic and procedural problems in web text search which remain; and the benefits of replacing the commercial search engine with tailored websearch architecture.

Restricted Access

Series:

Antoinette Renouf, Andrew Kehoe and David Mezquiriz

Abstract

The Web is a text store which can potentially supplement traditional corpora as a source of up-to-date linguistic data. The WebCorp project investigates this potential, and in its second year tackles some residual problems inherent in the nature of Web text, thereby refining its retrieval and analysis tool for the facilitation of corpus linguistic study.