We present an information-theoretic approach to investigate diachronic change in scientific English. Our main assumption is that over time scientific English has become increasingly dense, i.e. linguistic constructions allowing dense packing of information are progressively used.

So far, diachronic change in scientific writing has been investigated by means of frequency-based approaches (see e.g. ; ; Biber (b, c); ; ; ). We use information-theoretic measures (entropy, surprisal; ) to assess features previously stated to change over time and to discover new, latent features from the data itself that are involved in diachronic change.

For this, we use the Royal Society Corpus (rsc) (), which spans over the time period 1665 to 1869. We present three kinds of analyses: nominal compounding (typical of academic writing), modal verbs (shown to have changed in frequency over time), and an analysis based on part-of-speech trigrams to detect new features that change diachronically. We show how information-theoretic measures help to investigate, evaluate and detect features involved in diachronic change.

In: From Data to Evidence in English Language Research