Working styles in corpus-linguistic research are changing fast. One traditional constellation, close(d) communities of researchers forming around a specific corpus or set of corpora (the “Brown / LOB community”, “the BNC community”), is becoming increasingly problematical – particularly in the study of ongoing linguistic change and recent and current usage. The present contribution argues that whenever the possibilities of closed corpora are exhausted, it is advisable to turn to the digitised texts which – at least for a language such as English – are supplied in practically unlimited quantity on the world wide web. Web material is most suitable for studies for which large quantities of text and/or very recent texts are required. Specialised chat-rooms and discussion forums may additionally provide an unexpected wealth of material on highly specific registers or varieties not previously documented in corpora to a sufficient extent. On the basis of selected study examples it will be shown that, contrary to widespread scepticism in the field, web texts are appropriate data for variationist studies of medium degrees of delicacy – provided that a few cautionary procedures are followed in the interpretation of the results.
The discursive representation of knowledge, the fundamental objective of scientific inquiry, reflects underlying epistemic conditions of scientific thought (Bates 1995). Knowledge is communicated in scientific writing by means of lexical choice, discourse conventions and the organization of information. Over the long history of vernacular medicine, the writers of each era – from scholasticism and empiricism to evidence based medicine – have had their own perspectives on knowledge, revealed by the discursive practices they employed. Lexical items referring to the concept of knowledge (e.g. knowledge, information, doctrine) are investigated from the late Middle English period to Present-day English. We analyze variation and change in the lexicon of knowledge and analyze the discursive contexts in which the terms appear, showing how these have changed over time in different subgenres within learned medicine. The study makes use of several medical corpora with a total word count of roughly one million words: the MEMT is used for the Middle English period, and a selection of texts from the EMEMT corpus (articles from the Philosophical Transactions and other contemporary medical texts) represent the Early Modern English period. For the PDE period, we use a selection of research articles from academic journals and texts from the Medicor.
The name Great Britain or Britain occasionally occurs in the form the Britain. This paper discusses various parallel forms and their relation to the Britain, thereby attempting to explain its occurrence. There turns out to be a great deal of variation with regard to article usage among names of countries, some of which may, just possibly, have influenced the use of the article with Britain. An important element in this context, finally, is the articled and near-synonymous the UK.
This is a large-scale corpus study of relative constructions containing same in the antecedent. These differ from other relative constructions in that they permit the use of as as a relativizer and thus offer different possibilities of variation than other relative constructions, something that has not been well described in handbooks. We found that same-constructions occur much more frequently with relativizers having adverbial function, and that they also show a different semantic patterning than other adverbial relative constructions.
The most common relativizer in speech is as with over 50%; in writing, as and that each account for about a third. A variable rule analysis showed that the factors independently favouring the choice of as were the function of same as antecedent head, the functions of as as adverbial or subject complement, and occurrence in speech. There are also some differences between speech and writing when as is the relative marker in adverbial function, in that the ranking is manner-temporal-locative in speech and temporal-manner-locative in writing.
We discuss our findings in the light of the pragmatics of same-constructions and consider the history of as as a relative marker in English.
Semantic change observable in isolated linguistic items is both frequent and interesting in itself. More interesting, perhaps, are cases of structural change, i.e. cases where one and the same tendency can be discerned in a related group of words. This paper uses modern corpus material in order to sketch the development of one such group, words meaning ‘frightening’, and suggests that they all follow the same trend in the direction of ‘impressive, overwhelming’ although they differ with respect to how far they have advanced along that route. The semantic changes of some 25 words in the chosen area are studied in detail, and their development is illustrated with corpus material. One of the conclusions of the study is that their rate of semantic progress is partly dependent on the time when they entered the semantic field. The paper deals with the adjectives in the group and leaves the adverbs, although equally interesting, out of account for a later investigation.
For over two decades Jan Aarts has been actively involved in corpus linguistic research. He was the instigator of a large number of projects, and he was responsible for what has become known as the Nijmegen approach to corpus linguistics. It is thanks to him that words like TOSCA and LDB have become household names in the corpus linguistic community.
The present volume has been collected in his honour. The contributions in it cover a wide range of topics in the field of corpus linguistic research, especially those in which Jan Aarts takes a keen interest: corpus encoding and tagging, parsing and databases, and the linguistic exploration of corpus data. The contributions in this volume discuss work done in this field outside Nijmegen, for the obvious reason that we do not wish to present him with a report on work in which he is himself involved.
This volume provides a selection of the papers which were presented at the thirteenth conference on Computational Linguistics in the Netherlands (held in Groningen in November 2002). The subjects covered in this book represent a cross-section of current research topics in computational linguistics ranging from theoretical to applied research and development. The target audience consists of students and scholars of computational linguistics as well as speech and language processing, both in academia and industry.
This volume provides a selection of the papers which were presented at the eleventh conference on Computational Linguistics in the Netherlands (Tilburg, 2000).
It gives an accurate and up-to-date picture of the lively scene of computational linguistics in the Netherlands and Flanders.
The volume covers the whole range from theoretical to applied research and development, and is hence of interest to both academia and industry.
The target audience consists of students and scholars of computational linguistics, and speech and language processing (Linguistics, Computer Science, Electrical Engineering).
This volume presents a systematic overview of current research on the issues that arise when recreating and translating dialogue in works of fiction (including narrative, drama and film scripts). The central concept is that of fictive orality, a situational linguistic variety differing from spontaneous speech in various respects. Speech in fiction is the product of stylised recreation or evocation by an author. While realism and authenticity may be the most celebrated qualities, ultimately, the literary functions and the semiotic dimension of dialogue place significant constraints on the decisions taken both by the source text authors and the translators. Moreover, the traditions and conventions of the target culture act as powerful sources of expectations that influence the final form of the text.
This collective volume is divided into three parts: Part 1 deals with the translators’ own reflections on the qualities of fictive dialogue. Part 2 discusses the interaction of fictive orality with other varieties such as dialects (geographical, chronological and social) and genres. Part 3 discusses a range of language resources present in fictive dialogue (syntax and sentence connection, information packaging, pragmatic markers and modalisers, appreciative morphology and phrasemes, spelling and typographical conventions, deictics, etc). All chapters present research results in an accessible language and are thoroughly illustrated with translations from and into various European languages (English, German, French, Spanish, Catalan, Romanian and Italian) and their varieties. The volume will be of interest for scholars in translation studies and contrastive linguistics, for graduate students, and for readers interested in the translation of style.
This volume is witness to a spirited and fruitful period in the evolution of corpus linguistics. In twenty-two articles written by established corpus linguists, members of the ICAME (International Computer Archive of Modern and Mediaeval English) association, this new volume brings the reader up to date with the cycle of activities which make up this field of study as it is today, dealing with corpus creation, language varieties, diachronic corpus study from the past to present, present-day synchronic corpus study, the web as corpus, and corpus linguistics and grammatical theory. It thus serves as a valuable guide to the state of the art for linguistic researchers, teachers and language learners of all persuasions.
After over twenty years of evolution, corpus linguistics has matured, incorporating nowadays not just small, medium and large primary corpus building but also specialised and multi-dimensional secondary corpus building; not just corpus analysis, but also corpus evaluation; not just an initial application of theory, but self-reflection and a new concern with theory in the light of experience.
The volume also highlights the growing emphasis on language as a changing phenomenon, both in terms of established historical study and the newer short-range diachronic study of 20th century and current English; and the growing area of overlap between these two.
Another section of the volume illustrates the recent changes in the definition of ‘corpus’ which have come about due to the emergence of new technologies and in particular of the availability of texts on the world wide web.
The volume culminates in the contributions by a group of corpus grammarians to a timely and novel discussion panel on the relationship between corpus linguistics and grammatical theory.