Measuring Text Similarity Between the Two Editions of John Fowles’s The Magus

In: The State of Stylistics

John Fowles’s The Magus was first published in 1966. He then revised and republished it in 1977. My doctoral research is a comparative stylistic analysis of the two editions of the novel. In this paper, I will explore (i) what differences there are between the two versions by using particular corpus techniques on them and (ii) to what extent stylistic investigation and corpus techniques can be usefully combined.

I will briefly introduce how I have used TESAS/Crouch and WCopyfind software to detect and measure text similarity (in terms of ‘matched’ words, i.e. n-gram overlaps, stemmers and synonyms) between the two versions of the novel. With the aid of these two corpus tools, I will present the statistical results of a chapter-by-chapter comparison to show in quantitative terms the general pattern of revision between the first and second editions of the novel. I will then discuss the limitations in applying the corpus tools in relation to my stylistic comparison. I will use textual examples to illustrate the limitations of applying surface linguistic criteria to computational measurement of text similarity or text content, especially the difficulty in measuring the content of the texts involving extensive revision.


