An Information-Theoretic Approach to Modeling Diachronic Change in Scientific English

in From Data to Evidence in English Language Research
Get Access to Full Text

Have an Access Token?

Enter your access token to activate and access content online.

Please login and go to your personal user account to enter your access token.


Have Institutional Access?

Access content through your institution. Any other coaching guidance?



We present an information-theoretic approach to investigate diachronic change in scientific English. Our main assumption is that over time scientific English has become increasingly dense, i.e. linguistic constructions allowing dense packing of information are progressively used.

So far, diachronic change in scientific writing has been investigated by means of frequency-based approaches (see e.g. Halliday (1988); Atkinson (1998); Biber (2006b, c); Biber and Gray (2016); Banks (2008); Taavitsainen and Pahta (2010)). We use information-theoretic measures (entropy, surprisal; Shannon (1949)) to assess features previously stated to change over time and to discover new, latent features from the data itself that are involved in diachronic change.

For this, we use the Royal Society Corpus (rsc) (Kermes et al. (2016)), which spans over the time period 1665 to 1869. We present three kinds of analyses: nominal compounding (typical of academic writing), modal verbs (shown to have changed in frequency over time), and an analysis based on part-of-speech trigrams to detect new features that change diachronically. We show how information-theoretic measures help to investigate, evaluate and detect features involved in diachronic change.

Table of Contents

Index Card



All Time Past Year Past 30 Days
Abstract Views 69 69 34
Full Text Views 5 5 1
PDF Downloads 1 1 0
EPUB Downloads 0 0 0

Related Content