Working styles in corpus-linguistic research are changing fast. One traditional constellation, close(d) communities of researchers forming around a specific corpus or set of corpora (the “Brown / LOB community”, “the BNC community”), is becoming increasingly problematical – particularly in the study of ongoing linguistic change and recent and current usage. The present contribution argues that whenever the possibilities of closed corpora are exhausted, it is advisable to turn to the digitised texts which – at least for a language such as English – are supplied in practically unlimited quantity on the world wide web. Web material is most suitable for studies for which large quantities of text and/or very recent texts are required. Specialised chat-rooms and discussion forums may additionally provide an unexpected wealth of material on highly specific registers or varieties not previously documented in corpora to a sufficient extent. On the basis of selected study examples it will be shown that, contrary to widespread scepticism in the field, web texts are appropriate data for variationist studies of medium degrees of delicacy – provided that a few cautionary procedures are followed in the interpretation of the results.
The discursive representation of knowledge, the fundamental objective of scientific inquiry, reflects underlying epistemic conditions of scientific thought (Bates 1995). Knowledge is communicated in scientific writing by means of lexical choice, discourse conventions and the organization of information. Over the long history of vernacular medicine, the writers of each era – from scholasticism and empiricism to evidence based medicine – have had their own perspectives on knowledge, revealed by the discursive practices they employed. Lexical items referring to the concept of knowledge (e.g. knowledge, information, doctrine) are investigated from the late Middle English period to Present-day English. We analyze variation and change in the lexicon of knowledge and analyze the discursive contexts in which the terms appear, showing how these have changed over time in different subgenres within learned medicine. The study makes use of several medical corpora with a total word count of roughly one million words: the MEMT is used for the Middle English period, and a selection of texts from the EMEMT corpus (articles from the Philosophical Transactions and other contemporary medical texts) represent the Early Modern English period. For the PDE period, we use a selection of research articles from academic journals and texts from the Medicor.
For over two decades Jan Aarts has been actively involved in corpus linguistic research. He was the instigator of a large number of projects, and he was responsible for what has become known as the Nijmegen approach to corpus linguistics. It is thanks to him that words like TOSCA and LDB have become household names in the corpus linguistic community.
The present volume has been collected in his honour. The contributions in it cover a wide range of topics in the field of corpus linguistic research, especially those in which Jan Aarts takes a keen interest: corpus encoding and tagging, parsing and databases, and the linguistic exploration of corpus data. The contributions in this volume discuss work done in this field outside Nijmegen, for the obvious reason that we do not wish to present him with a report on work in which he is himself involved.
This volume provides a selection of the papers which were presented at the thirteenth conference on Computational Linguistics in the Netherlands (held in Groningen in November 2002). The subjects covered in this book represent a cross-section of current research topics in computational linguistics ranging from theoretical to applied research and development. The target audience consists of students and scholars of computational linguistics as well as speech and language processing, both in academia and industry.
This volume provides a selection of the papers which were presented at the eleventh conference on Computational Linguistics in the Netherlands (Tilburg, 2000).
It gives an accurate and up-to-date picture of the lively scene of computational linguistics in the Netherlands and Flanders.
The volume covers the whole range from theoretical to applied research and development, and is hence of interest to both academia and industry.
The target audience consists of students and scholars of computational linguistics, and speech and language processing (Linguistics, Computer Science, Electrical Engineering).
This volume presents a systematic overview of current research on the issues that arise when recreating and translating dialogue in works of fiction (including narrative, drama and film scripts). The central concept is that of fictive orality, a situational linguistic variety differing from spontaneous speech in various respects. Speech in fiction is the product of stylised recreation or evocation by an author. While realism and authenticity may be the most celebrated qualities, ultimately, the literary functions and the semiotic dimension of dialogue place significant constraints on the decisions taken both by the source text authors and the translators. Moreover, the traditions and conventions of the target culture act as powerful sources of expectations that influence the final form of the text.
This collective volume is divided into three parts: Part 1 deals with the translators’ own reflections on the qualities of fictive dialogue. Part 2 discusses the interaction of fictive orality with other varieties such as dialects (geographical, chronological and social) and genres. Part 3 discusses a range of language resources present in fictive dialogue (syntax and sentence connection, information packaging, pragmatic markers and modalisers, appreciative morphology and phrasemes, spelling and typographical conventions, deictics, etc). All chapters present research results in an accessible language and are thoroughly illustrated with translations from and into various European languages (English, German, French, Spanish, Catalan, Romanian and Italian) and their varieties. The volume will be of interest for scholars in translation studies and contrastive linguistics, for graduate students, and for readers interested in the translation of style.
This volume is witness to a spirited and fruitful period in the evolution of corpus linguistics. In twenty-two articles written by established corpus linguists, members of the ICAME (International Computer Archive of Modern and Mediaeval English) association, this new volume brings the reader up to date with the cycle of activities which make up this field of study as it is today, dealing with corpus creation, language varieties, diachronic corpus study from the past to present, present-day synchronic corpus study, the web as corpus, and corpus linguistics and grammatical theory. It thus serves as a valuable guide to the state of the art for linguistic researchers, teachers and language learners of all persuasions.
After over twenty years of evolution, corpus linguistics has matured, incorporating nowadays not just small, medium and large primary corpus building but also specialised and multi-dimensional secondary corpus building; not just corpus analysis, but also corpus evaluation; not just an initial application of theory, but self-reflection and a new concern with theory in the light of experience.
The volume also highlights the growing emphasis on language as a changing phenomenon, both in terms of established historical study and the newer short-range diachronic study of 20th century and current English; and the growing area of overlap between these two.
Another section of the volume illustrates the recent changes in the definition of ‘corpus’ which have come about due to the emergence of new technologies and in particular of the availability of texts on the world wide web.
The volume culminates in the contributions by a group of corpus grammarians to a timely and novel discussion panel on the relationship between corpus linguistics and grammatical theory.
The subjects found in this book represent a cross-section of current research topics in computational linguistics and are related to the fields of grammatical description, statistical modelling and natural language technology. Grammatical description is included in the form of work on HPSG, both the application of HPSG and investigation of the structure of HPSG itself. Another popular methodology, statistical modelling, is amply present as well: there are papers on the use of statistical models in such varied areas as language evolution, phonotactics, index term identification and bilingual dictionary creation. Finally, as can also be seen from the latter two subjects, computational linguistics is closely related to natural language technology systems and resources. This is shown by papers on document analysis, controlled languages, text generation and lexicon acquisition.
This volume provides a selection of the papers which were presented at the ninth conference on Computational Linguistics in the Netherlands (Leuven, 1998). It gives an accurate and up-to-date picture of the lively scene of computational linguistics in the Netherlands and Flanders. In terms of topics the contributions can be grouped under three headings: the use of statistical methods in speech and language processing (6 papers), the analysis of syntactic and semantic phenomena in the framework of computationally oriented formalisms, such as Head-driven Phrase Structure Grammar (5 papers), and the development of NLP applications, such as document processing, dialogue modelling and teaching (3 papers). The volume covers the whole range from theoretical to applied research and development, and is hence of interest to both academia and industry. The target audience consists of advanced students and scholars of computational linguistics, and speech and language processing (Linguistics, Computer Science, Electrical Engineering).
This volume brings together twenty-two of the world's leading translation and interpreting theorists, to address the issue of sensitivity in translation. Whether in novels or legal documents, the Bible or travel brochures, in translating ancient texts or providing simultaneous interpretation, sensitive subject-matter, contentious modes of expression and the sensibilities of the target audience are the biggest obstacles to acceptance of the translator's work. The contributors bring to bear a wide variety of approaches - generative, cognitive, lexical and functional - in confronting this problem, and in negotiating the competing claims of source cultures and target cultures in the areas of cultural, political, religious and sexual sensitivity. All of the articles are presented here for the first time, and in his Introduction Karl Simms gives an overview of the philosophical and linguistic questions which have motivated translators of sensitive texts through the ages. This book will be of interest to all working translators and interpreters, and to teachers of translation theory and practice.