Electronic Transcriptions of New Testament Manuscripts and their Accuracy, Documentation and Publication

In: Ancient Manuscripts in Digital Culture
Author: H.A.G. Houghton
Open Access


The adoption of digital editing software has led to a significant change in the process of creating a critical edition of the New Testament, as embodied in the Novum Testamentum Graecum Editio Critica Maior. Data is no longer gathered as a collation of witnesses against a standard base text, but in the form of complete transcriptions of individual manuscripts which then form the basis of an automatically generated apparatus. This chapter outlines the procedures involved in creating a body of such electronic data. In particular, it considers the accuracy and transparency of the current transcription process for this edition, suggesting that proofreading is an important stage even if a double-blind approach has been used for the initial transcriptions and arguing for a fuller use of the TEI Header to describe the source and limitations of the transcription. It also addresses the publication and release of XML files, proposing that such scholarly work is best made available in the form of individual files consisting of a single biblical book and under a license which only requires attribution to the original creators when the data is re-used rather than restricting data to non-commercial use or stipulating that derivatives must be released under the same terms (share-alike).

1 Introduction*

The adoption of digital tools to edit the Greek New Testament has fundamentally changed the methodology of creating such an edition. In the past, data was painstakingly gathered in the form of collations of manuscripts against a standard printed text, which were then combined to create an apparatus of readings.1 The base text used for collation was a fixed point against which everything was measured; once the apparatus was constructed, the individual collations were no longer required. In contrast, electronic editing software (in particular, the widely-adopted Collate program and its successors) is based not on a single apparatus but on multiple files, each of which consists of a complete electronic transcription of a single manuscript witness.2 The apparatus is compiled automatically from these files, using an algorithm to improve alignment and creating meta-files to assist with the normalisation of the data. This has at least four distinct advantages over the previous method: the performance of the mechanical task of compilation by a computer is much quicker, less susceptible to human error, reproducible and reconfigurable. A collation can be re-run from the same files with different settings or a different selection of witnesses. It is therefore the complete electronic transcriptions rather than collations of variants (and the apparatus created by collating these collations) which become the building blocks of editing a text.

The result is that the first generation of digital editors have a double task, as I have observed elsewhere:

First of all, they must edit the individual documents, creating an electronic archetype of each witness for the required biblical book. Only then can they proceed to use this information to edit the text itself.3

This procedure of making electronic transcriptions is fully integrated into the workflow of the Novum Testamentum Graecum Editio Critica Maior (ECM) and has also been adopted in other editorial projects relating to the New Testament, such as the Vetus Latina Iohannes and the Digital Codex Sinaiticus. In the light of the experience gained on these projects, it is now appropriate to reflect on the creation and use of electronic transcriptions of the New Testament and make some recommendations for good practice. This chapter will briefly outline the process of making electronic transcriptions and the ways in which they can be used, before turning to consider three areas in which further clarity or standardisation may be beneficial. These are, in turn: the accuracy of transcriptions; the documentation of transcription practice; and finally, the publication of electronic transcriptions, especially with regard to authority and availability.

2 Making and Using Electronic Transcriptions

The principle of making an electronic transcription of a New Testament manuscript is remarkably similar to creating a paper collation, even though the result is different.4 Because the textual agreement between almost all manuscripts and editorial reconstructions is around 90% (and even higher in many cases), the most efficient way for transcribers to proceed is to take an electronic file of an editorial text, compare it with the manuscript, and intervene at every point of variation, in this case by adjusting the file to match the reading of the manuscript.5 Selecting a base text close to that of the witness, such as the Textus Receptus for transcribing Byzantine manuscripts, means that the transcriber has to introduce fewer changes. The choice of base text should be unimportant, since the resultant transcription file should reproduce the text of the manuscript: it is only if the transcriber overlooks a discrepancy that a reading of the base text will persevere unchanged.6 One instance where the base text is likely to affect the transcription is in the transcriber’s interpretation of unclear characters or treatment of damaged portions, so the use of a base text similar to that of the manuscript could assist with this.7

For an edition of the text of a particular book of the New Testament, an electronic transcription need only represent the biblical text copied by the original scribe and any subsequent corrections. Where this is absent or somehow doubtful, the relevant text should be correspondingly marked as lacunose, reconstructed or unclear. In practice, however, transcribers for the ECM also introduce basic information about the layout, recording page, column and line breaks: the benefits of this include the easy comparison of transcription and image, especially useful in proofreading, and ensuring that transcribers are constantly engaged with the manuscript through regular intervention in the file, rather than losing attention if the differences between the base text and manuscripts are scarce. The amount of information recorded in a transcription can easily be increased, such as the inclusion of abbreviations, punctuation, decoration or paratext.8 A balance must be struck in order to enable transcribers to work with maximum effectiveness and not become distracted from textual accuracy by recording additional features.9 It may also be noted in passing that the degree of engagement with a manuscript required to make a full electronic transcription places a researcher in a strong position to assess its textual evidence, given Hort’s maxim that “Knowledge of documents should precede final judgment on readings.”10

The first generation of electronic transcriptions, created for use with the original Collate software, were plain-text files with basic tags for markup, produced in a standard text editor.11 These were converted in a separate process to a more advanced format for publication (first SGML, then XML). The Workspace for Collaborative Editing project produced the browser-based Online Transcription Editor in 2013. This enabled transcribers to work directly on XML files in a display which matched the published transcriptions, the markup being hidden behind the scenes.12 Not only was the aim to standardise the markup and deliver formally correct files, but this procedure also meant that transcriptions could be published online and distributed immediately. One of the strengths of XML encoding corresponding to the TEI Guidelines is that each file is complete in itself, with a standard form of markup which is not only largely readable by humans but also actionable by machines. This is vital for the long-term sustainability of these files as well as their availability for re-use, as discussed below. The Online Transcription Editor supports a wide variety of TEI-compatible features which can be added as enhancements to standard transcriptions, such as formatting, annotations and other paratextual features.

Unlike printed transcriptions and collations, electronic files may be re-used or developed in a variety of ways. A transcription created as part of a study of an individual manuscript may be incorporated into an edition.13 A transcription created for one edition may be used in another.14 A transcription produced for an edition may be adopted by a holding institution and displayed alongside images of the manuscript, perhaps with the addition of further information.15 A transcription produced by a research project may be adapted by a commercial software provider and included on their platform.16 All these scenarios have taken place in recent years, and demonstrate how a single electronic file can be redeployed in ways which are impossible for printed texts. Electronic files may also be easily adjusted if errors are spotted, or improved as new images or processing techniques become available. When investigating the biblical text of a particular manuscript, my own practice has been to make a transcription as this requires little more effort than a collation: the file can then be used to generate a list of variants from a standard text or compare it with another manuscript, and the transcription is released through the Institutional Research Archive to complement the published study.17

3 The Accuracy of Electronic Transcriptions

The first area to be addressed more fully in this chapter consists of the measures taken to ensure the accuracy of electronic transcriptions. Given the key role these files play in the construction of scholarly editions, accuracy is paramount: as mentioned above, the apparatus is generated directly from these files and they can be used directly for various different types of analysis. In addition, the full transcriptions are normally incorporated into electronic editions, providing the user with the complete set of data on which the edition is based. It is worth remembering at the outset that electronic transcriptions are an abstraction, a translation of a calligraphic artefact into the standard tokens of digital text; what is more, the transcriber’s decisions regarding certain readings may remain open to interpretation, particularly if the original is damaged or hard to read.18 Nevertheless transcribers, like manuscript copyists, are human and perform at different levels: even those who are normally reliable have off-days, so it is important to have a rigorous checking process to ensure that errors at this initial stage do not persist into the final edition.

The procedure for ensuring accuracy will vary from project to project, according to the resources at the disposal of each and the amount of information which each project chooses to record in its transcriptions. The current practice for Greek manuscripts in the ECM is that two transcriptions are made independently, which are then automatically collated with each other and the differences are reconciled by an experienced scholar, who alters one of the files with reference to the images of the manuscript.19 Historically, this double-blind approach has been adopted by numerous projects for the creation of electronic text.20 The high element of redundancy seems to have been counterbalanced by the relatively low cost of non-specialist labour. In the case of manuscript transcriptions, however, the situation is more complicated than producing a digital surrogate for printed text. It has even been claimed in one standard manual that the method of double keyboarding “has nothing to offer the scholar who wants to create an edition from manuscript material”.21

Based on his experience with the International Greek New Testament Project (IGNTP), however, Parker states that:

The double transcription is an effective way of eliminating error, so long as both initial transcriptions are of a sufficiently high quality for the two transcribers to be unlikely to make the same mistake independently.22

What constitutes a sufficiently accurate initial transcription? In criticising Abbott’s collation of Codex Usserianus Secundus, Hoskier suggests that over the course of two gospels, “a good collator or copyist should make but half a dozen errors” rather than the one thousand he identifies in Abbott’s work.23 This seems overambitious, even when orthography is not taken into account. A figure which was informally suggested for postdoctoral transcribers working on the ECM of John was no more than two errors per biblical chapter. This would leave minimal work to be done at the point of reconciliation, but already represents an achievement comparable to many printed transcriptions.24 Often, however, the initial electronic transcriptions are made by students or volunteers who are still in the process of developing their skills.25 In terms of efficiency, the process would clearly be inadequate if it took an experienced reconciler more time to process a pair of transcriptions and reconcile the differences between them than to produce his or her own expert transcription.26 Setting an acceptable level of accuracy beyond this is somewhat arbitrary, as transcribers normally improve over time and manuscripts vary considerably in legibility. Nevertheless, the more mistakes there are in one initial transcription, the more likely it is to agree in error with the other transcription used for reconciliation. This is especially the case if the initial transcribers have not worked independently but compared notes as they went along. As reconciliation only addresses differences between the two transcriptions, if both transcribers fail to adjust their base text at the same place, the error will not be visible to the reconciler and will therefore be allowed to stand. Furthermore, the more interventions a reconciler has to make in a transcription file, the greater the likelihood of him or her overlooking a discrepancy. For instance, if verses are not correctly identified or appear on more than one occasion, the entire verse will be highlighted as a difference, obscuring any internal textual variation.

Procedures for ensuring accuracy should also attend to the activities of the reconciler, who has a responsibility not to introduce any new errors and also a key role in file management. The file in which the corrections have been entered needs to be clearly identified. If not, there is a risk that one of the two initial transcriptions may erroneously be treated as the reconciled file, or even that an unaltered copy of the base text may be treated as a transcription. A belt-and-braces approach of both altering the file name at this point and recording its reconciled status in the body of the file is most secure. Procedural flaws may be picked up when unexpected data is returned, such as 100% agreement with the base text in statistical comparisons or typographical errors and unusual readings appearing in the apparatus prepared for the edition. Indeed, the process of editing a collation of new files almost always involves returning to the transcriptions themselves to make adjustments, such as changes to verse- or word-division, the treatment of lacunae, or the reconstruction of supplied text in the light of wider tradition as well as verifying (and if necessary correcting) any textual errors.27

A strong case may therefore be made for adding proofreading as a further stage in the transcription process, especially in cases where both transcriptions have been made by relatively inexperienced scholars or where one of the transcribers also served as reconciler. As mentioned above, a high number of differences between the transcriptions increases the probability that both transcribers may have made a similar mistake or that the reconciler might miss an alteration. The inclusion of page, column and line breaks in a transcription makes it a relatively straightforward task to compare it with the manuscript, and enables the proofreader to focus on the entire text rather than being restricted to the points of variation thrown up during reconciliation. Indeed, if the whole manuscript is not examined by an expert, there is the possibility that significant information may be overlooked, such as an unindicated lemma in a catena manuscript or a set of marginal corrections.

When an initial transcription has been made by an experienced scholar, however, simply proofreading this is as likely to result in as accurate a transcription as the double-blind process, as well as being more economical of time. In this scenario, too, the entire manuscript will have been examined twice by experts, which is not the case for a reconciled and proofread file based on initial transcriptions made by inexperienced transcribers. This single-transcription approach was adopted in the COMPAUL project, and continued to result in improvements when compared with earlier published transcriptions.28 It has also been employed by other projects, such as the Piers Plowman Electronic Archive and the Coptic editions at the Institut für neutestamentliche Textforschung (INTF); it is also the only method which is practicable for scholars working on their own.29 Another advantage of a proofreading stage is that it promotes consistency across files, such as in the way that marginalia are recorded or editorial notes are added. Conforming such details to a standard format during the reconciliation process risks detracting from the focus on textual accuracy at this point.

One final observation on the accuracy of electronic transcriptions relates to the flexibility of electronic text and publication. The release of transcriptions on the internet enables a wide body of users to check them and provide comments. Feedback on both the Digital Codex Sinaiticus and the IGNTP transcriptions of the Gospel according to John has been received through a dedicated feedback page, emails, message-board posts and even published articles.30 In several instances, this has led to an alteration to the transcriptions; for an edition eventually to appear in print, corrections at this preliminary stage will result in even more reliable data for the final publication. This broader engagement demonstrates the importance that electronic transcriptions have already achieved within the scholarly community and underlines how a single file in the digital sphere can be used and improved to support further research.

4 Documentation of the Transcription Process

The second area to be considered in this chapter is how the transcription process is documented. One of the strengths of XML is that all markup is included within the file itself, so that a single file contains the transcribed text of each manuscript, indications of layout and other non-textual data, and even the transcriber’s own commentary.31 The multiple layers of textual history in a single document can thereby be included in its electronic surrogate, beginning with the work of the original scribe and subsequent correctors or annotators as recorded on the page; to these may be added the observations of the transcriber responsible for translating the text into electronic form and those of other editors or correctors of the digital file. The result is a considerable gain in transparency, coupled with the benefit of having all information at the relevant place: the practice in many printed transcriptions of relegating corrections or comments to an appendix (as well as lists of errata appearing elsewhere) can make then very unwieldy in this respect.32

Most importantly, the file should include information about the practices adopted for the creation of the transcription itself. This chapter has already noted that it is advisable to record the transcription status, such as the date it was reconciled or proofread, as part of the file. While the primary purpose of this is for the internal monitoring of the project, there are many more details which external users may need to know, such as the sources used by the transcriber, the treatment of abbreviations and punctuation, and other principles on which the transcription was made.33 Without this information, a certain amount of detective work would be required in order to work out the contents and scope of the transcription as well as reconstruct what may be known of the history of its production. This absence of these indications also compromises the value of the transcription as an authority, a topic to which we shall return shortly.

The TEI P5 guidelines require that, to be properly formed, each XML file should have a header with information about the contents of the file and its encoding.34 The range of elements permissible within this header also enable the provision of extensive further information, if so desired. For example, in the “Source Description” section, a full bibliographic description of the manuscript can be given along with the sigla assigned to it in various catalogues, while in the “Declaration of Editorial Practices” section a free-text explanation can be given of the principles adopted for the transcription or a more structured description of how particular elements have been handled. Changes to the file can be logged individually in the “Revision Description” section, providing a full history of any later alterations. The TEI header is therefore the obvious place to document the creation and history of the following text, and should be considered obligatory for all electronic transcriptions when they are made available for further use.35

As part of the Workspace for Collaborative Editing project, an XML schema was developed for transcriptions of New Testament manuscripts.36 This included a version of the TEI header, to which some adjustment now seems appropriate. For a start, the transcription ought to include details of the images and any other sources used by the transcriber. A transcription based on digitised monochrome microfilm often has serious limitations, not least as it can be a challenge to identify corrections from such images. When new high-resolution colour digital images become available, these can enable much greater precision and even bring to light text obscured in the older photographic process, especially if the manuscript has been rebound in the interim.37 Information about the use of the editio princeps or any other editions should also be specified, as, indeed, should any consultation of the original in situ. This material can be added in the section on manuscript description, using the <additional> and <surrogates> elements. It is also worth noting as a matter of good practice that the more information which can be added in the <msIdentifier> and <altIdentifier> elements about the identifiers of the manuscript in different catalogues, the easier it will be for the transcription to be located and used by other projects or even by automatic resource aggregators. The inclusion of the Diktyon number among the keywords of journal articles relating to Greek manuscripts has been encouraged, and if recently-announced proposals to create an International Standard Manuscript Number (ISMSN) bear fruit this too should be included in the header.38

Secondly, the declaration of editorial principles should be expanded from a general reference to the project’s transcription guidelines to include specific information on the way in which the following aspects have been handled:

The identification of correctors; layout; abbreviations (and nomina sacra); punctuation; capitalisation; rubrication and ornamentation; word-division; marginalia; non-biblical text.

Some of this information used to be included in the header to plain-text transcriptions but was not converted when they were translated into XML, or was imported as a single free-text editorial note at the beginning of the transcription. Given that the same project may treat certain categories of manuscripts differently, such as preserving all abbreviations in majuscule manuscripts but expanding them in minuscules, the structured provision of this information means that it is recorded on a case-by-case basis and offers a clear guide to the principles and limitations of the present transcription. This information would also be helpful for the later enhancement of transcriptions, when features not recorded by the original transcriber can be systematically added. A number of the categories suggested above are already catered for in the TEI P5 Guidelines by elements such as <interpretation>, <normalization>, <segmentation> and <punctuation>, while others can be expressed in free-text form.39 The presence of this information within the header provides a clear statement about the scope of the following transcription, explaining the areas in which it claims to represent the manuscript and details which have not been consistently or fully recorded.

Thirdly, a strong case may be made for identifying contributors to the transcriptions by name. To date, the practice of the IGNTP has been to list all transcribers by name at the beginning of a published volume rather than connect them with particular manuscripts.40 While this recognises the involvement of multiple people in each transcription, with the overall project taking responsibility for the accuracy of the data, it obscures any variation in the extent of the contributions made by each individual. Including details of transcribers in the TEI header when electronic transcriptions are published online provides immediate and demonstrable recognition, enabling transcribers to cite work in which they are expressly credited. This is especially important for students whose transcription forms part of an assessed portfolio, or who wish to show evidence of their wider involvement in the research field. At the same time, recording the names of those responsible for each stage of the process serves to confirm the status of the file within the workflow, indicating that it has been reconciled or proofread by an experienced scholar. Any errors remain a collective responsibility, and can easily be corrected once brought to the attention of the project: the driving force behind this proposal is to provide recognition and transparency, especially if the transcriptions produced for a particular project go on to be re-used elsewhere. In IGNTP work on John, individuals are already identified in the log of changes in each file; for transcriptions of the Pauline Epistles, contributors will be listed by name in the “Responsibility Statement” section which is part of the TEI header.41

5 The Publication of Electronic Transcriptions

The third section of this chapter deals with issues connected with the online publication of electronic files, in particular the authority they have and the manner in which they are made available. The matter of authority is highlighted by the many anonymous or inadequately documented biblical texts which are included in online portals: they are of no value for scholarly use until their provenance can be established.42 The problem is not a new one: the reprinting of editions of the Bible with different title pages, sometimes without permission, was not uncommon in the early days of printing. The implementation of the changes to the XML header which have just been suggested, providing full details of the transcription principles and those responsible for the file’s creation, will go some way towards ensuring that electronic transcriptions can be reused and cited in academic research, since their scope and origins will be expressly stated within the file. As indicated above, part of a transcription’s authority derives from the transparency of its documentation: the systematic use of the “Revision Description” section in the XML header to record all changes is good practice in this respect.

The question of the availability of electronic transcriptions may be approached on two levels, the legal and the practical. Both the IGNTP and INTF have sought to encourage the re-use of their transcriptions by releasing them under Creative Commons licences since 2010.43 This free general release of the data also acknowledges the contribution of public funds to their creation, a practice which has more recently been made obligatory by certain research agencies, including the European Research Council and UK Research Councils. A question remains as to whether the licences should restrict the re-use of these transcriptions to non-commercial activities. Until late 2017, this was the position of the IGNTP, due to a concern that profit should not be made from public-funded research; the re-use of the Codex Bezae and Codex Sinaiticus transcriptions on the commercial Logos platform was permitted on condition that they would be released without charge to users. In 2013, however, the INTF removed the non-commercial stipulation, specifying only that re-used files should have attribution to the original creator and be made available under the same licence (share-alike). This position has been endorsed in scholarly discussions about data sustainability, since the files will continue to be made freely available even if integrated into a commercial package.44 However, even the share-alike requirement can work against the re-use of data, since a single resource which combines files from multiple contributors released under differing licenses cannot match the conditions set out for each one.45 The expectation for the re-use of material from printed scholarly publications is that the original source is acknowledged, without restriction on the manner in which the subsequent work is made available (within the bounds of copyright law and fair-use policy). If a subsequent user has incurred costs in the enhancement of transcriptions, it is reasonable to allow them to seek to offset this expense if they so desire when releasing their own files: the initial data remains available free of charge and the original creators do not suffer any financial disadvantage. Following the original presentation of this chapter, a proposal was tabled that the IGNTP and other creators of electronic transcriptions should follow INTF’s lead of removing the non-commercial stipulation from their licences and also dispense with the share-alike requirement, in order to allow for the widest possible re-use of this data. This was unanimously approved by the IGNTP committee in November 2017 and applied retrospectively with the release of 350 New Testament transcriptions under a Creative Commons 4.0 Attribution licence.

In reality, it is often practical measures for making transcriptions publicly available which can prove the stumbling block to their re-use. Earlier digital editions relied on a publishing model which served transcriptions as HTML generated from a database and provided no access to the original files: this is the case with editions of New Testament writings created with the Anastasia software as well as the transcription display in the Digital Codex Sinaiticus project, although the latter has the whole transcription file available as a separate download.46 The adoption of a standard XML format has made it much easier to provide direct access to raw transcription files, manuscript by manuscript, and establish repositories where these are made available. For example, all IGNTP transcriptions are published online as XML files once they have been reconciled, to enable their re-use and open them to public scrutiny.47 Similarly, although no explicit information about this currently seems to be available for non-technical users of the website, transcriptions in the NT.VMR can also be accessed as XML through a call to the application programming interface (API).48 Again, good practice calls for stable internet addresses and some form of version control, so that users can be clear that they are accessing the latest form of the file and are made aware of any differences from earlier versions through the log of changes.49

One aspect which has not been formally agreed is a default unit size for authoritative transcription files. In theory, this could encompass anything from a single page to a complete manuscript. The most practical and logical division, however, is by book. A book is a single, externally defined production unit, whereas the content of pages (and even of complete manuscripts) varies from document to document. The TEI header, too, is predicated at the level of the document or work rather than any smaller subdivision: attaching a full header to each individual page would not just double the size of the file, but result in partial information for many of the categories and make it very difficult to identify and link to a specific transcription. Conversely, it is straightforward to link individual page images to a transcription of the full book. The workflow for the ECM treats the book as a default unit, too, as the allocation of work to different teams in the project has been made on this basis. The main problem posed by this approach is how to join files when one book ends and another begins on the same page, but this is a matter of display rather than encoding.50 In terms of making transcriptions publicly available, each biblical book is the smallest intuitive unit and the most practicable in current project workflows, although there is no reason why these files cannot be joined together to create a single file per manuscript so long as the transcriptions are consistent and the header is suitably updated.

Finally, the emphasis in this section thus far has been on publication as the release of transparent, authoritative electronic files, which can be cited according to scholarly norms. Yet, as it has often been said, one of the innovations of digital transcriptions is the possibility for other users not connected with the original project to enhance them in some way. The problem with this is how to connect these updated files with their original sources and enable scholarship to develop in a cumulative way. Contributions by users through different forms of feedback have already been mentioned above. A more organic form of development, however, would be through the release of transcriptions in a public repository, such as the well-known GitHub site for software collaboration.51 This site has extensive versioning controls, so that (as in Wikipedia) one can see which users were responsible for which changes. It also has the possibility for users to ‘fork’ files, copying them into a particular branch for specific development while leaving the originals untouched. One could imagine, say, that a project adding information to transcriptions about paratextual features, or editors wanting to use a defined set of files to create an edition, would develop their own forks. The strength of this approach is that there would be a single place to locate files, and users themselves would have the ability to link their files back to earlier versions of the same transcription. Given the practical problems of managing users and files, however, if such an idea were considered worth adopting, it may initially have to be implemented in parallel with the current, more specific, project-based approach.

6 Conclusion

In conclusion, as stated at the beginning of this chapter, full-text electronic transcriptions are now firmly embedded in the production of scholarly editions of the New Testament, as well as those in other disciplines. What is more, a set of standards for the encoding of these files in TEI compliant XML has been widely adopted, and there is also a user-friendly interface for the creation and alteration of these transcriptions in the form of the Online Transcription Editor. This situation is to be celebrated, as it promotes collaboration towards a long-term goal.

This chapter has sought to look beyond transcriptions as the initial stage of an edition to their role as files in their own right which can be re-used and enhanced outside of the original context. While the procedures adopted by a specific project may seem self-evident to its members, they are not necessarily so transparent to other scholars or future generations. We do not know the uses to which these files may be put. Yet one of the particular benefits of electronic files is the potential they have to be redeployed, to enable others to start not from scratch but to be able to build on the best existing resources. It is this concern which underpins the suggestions made here about accuracy, full documentation, authority and availability. David Parker’s comment that “part of the purpose of the electronic transcription is that it will not become obsolete” can only be justified if care is taken to ensure that they are created with wider usage in mind.52

Despite the proliferation of digital images of New Testament manuscripts, printed transcriptions and facsimiles from previous centuries continue to play a part in New Testament scholarship. Electronic transcriptions supersede these older publications in numerous ways, not least because of the way in which they can be processed, analysed and developed to inform a whole new generation of research questions. My hope is that, by encouraging full documentation in these files and clear standards for how they are made available, the work being undertaken today may prove to be as long-lasting as that produced by the earlier giants on whose shoulders we stand today.


  • 164

    Berrie Phill Authenticating Electronic Editions Electronic Textual Editing Burnard Lou New York MLAA 2006 269 276

  • 165

    Cayless Hugh Viglianti Raffaele Presentation for a Society of Textual Scholarship workshop Publishing Editions on GitHub Pages with the Text Encoding Initiative 2007 http://go.umd.edu/STS-TEI

    • Search Google Scholar
    • Export Citation
  • 166

    Czmiel Alexander Sustainable Publishing: Standardization Possibilities For Digital Scholarly Edition Technology http://dixit.uni-koeln.de/convention-2-abstracts/#czmiel

    • Search Google Scholar
    • Export Citation
  • 167

    Durusau Patrick Why and How to Document your Markup Choices Electronic Textual Editing Burnard Lou New York MLAA 299 309

  • 168

    Elliott W.J. Parker David C. The New Testament in Greek IV. The Gospel According to St John 1 Leiden Brill 1995

  • 169

    Fenton Eileen G. Duggan Hoyt N. Effective Methods of Producing Machine-Readable Text from Manuscript and Print Sources Electronic Textual Editing Burnard Lou O'Brian O'Keefe Unsworth John New York MLAA 241 253

    • Search Google Scholar
    • Export Citation
  • 170

    Gilbert Penny Automatic Collation: A Technique for Medieval Texts Computers and the Humanities 7 3 1973 139 146

  • 171

    Hoskier Herman C. The Text of Codex Usserianus 2. r2. (“Garland of Howth”). With Critical Notes to Supplement and Correct the Collation of the Late Thomas K. Abbott, London Quaritch 1919

    • Search Google Scholar
    • Export Citation
  • 172

    Houghton H.A.G. The Electronic Scriptorium: Markup for New Testament Manuscripts Digital Humanities in Biblical, Early Jewish and Early Christian Studies Clivaz Claire Gregory Andrew Hamidović David Leiden Brill 2014 31 60

    • Search Google Scholar
    • Export Citation
  • 173

    Houghton H.A.G. The Gospel according to Mark in Two Latin Mixed-Text Manuscripts Revue Bénédictine 126 1 2016 16 58

  • 174

    Houghton H.A.G. Parker David C. Peter M. Wachtel Klaus The Editio Critica Maior of the Greek New Testament: Twenty Years of Digital Collaboration Digital Philology

    • Search Google Scholar
    • Export Citation
  • 175

    Houghton H.A.G. Sievers Martin Smith Catherine J. The Workspace for Collaborative Editing Digital Humanities 2014 Conference Abstracts, EPFL-UNIL, Lausanne, Switzerland, 8-12 July 2014 http://dharchive.org/paper/DH2014/Paper-224.xml

    • Search Google Scholar
    • Export Citation
  • 176

    Houghton H.A.G. Smith Catherine J. Digital Editing and the Greek New Testament Ancient Worlds in Digital Culture Clivaz Claire Dilley Paul Hamidović David Leiden Brill 2016 110 127

    • Search Google Scholar
    • Export Citation
  • 177

    Krans Jan Codex Boreelianus (F 09) and the IGNTP Edition of John TC: A Journal of Biblical Textual Criticism 15 2010 http://rosetta.reltech.org/TC/v15/Krans2010.pdf

    • Search Google Scholar
    • Export Citation
  • 178

    Müller Darius Zur elektronischen Transkription von Apokalypsehandschriften: Bericht zum Arbeitsstand Studien zum Text der Apokalypse II Sigismund Markus Müller Darius Berlin De Gruyter 2017 19 30

    • Search Google Scholar
    • Export Citation
  • 179

    Ott Wilhelm Transcription and Correction of Texts on Paper Tape: Experiences in Preparing the Latin Bible Text for the Computer LASLA Revue 2 1970 51 66

    • Search Google Scholar
    • Export Citation
  • 180

    Parker David C. An Introduction to the New Testament Manuscripts and their Texts Cambridge CUP 2008 95 100

  • 181

    Parker David C. Codex Sinaiticus: The Story of the World's Oldest Bible London British Library 2010

  • 182

    Parker David C. Textual Scholarship and the Making of the New Testament, Oxford OUP 115

  • 183

    Robinson Peter Collate: Interactive Collation of Large Textual Traditions, Version 2, Oxford 1994

  • 184

    Robinson Peter Some Principles for Making Collaborative Scholarly Editions in Digital Form Digital Humanities Quarterly 11 2 2017 http://www.digitalhumanities.org/dhq/vol/11/2/000293/000293.html

    • Search Google Scholar
    • Export Citation
  • 185

    Robinson Peter The Collation and Textual Criticism of Icelandic Manuscripts. 1. Collation Literary and Linguistic Computing 4 2 1989 99 105

    • Search Google Scholar
    • Export Citation
  • 186

    Schmid Ulrich B. Elliott W.J. Parker David C. The New Testament in Greek IV. The Gospel According to St John Volume 2 Leiden Brill 2007

    • Search Google Scholar
    • Export Citation
  • 187

    The International Greek New Testament Project The New Testament in Greek. The Gospel According to St Luke. Part One. Chapters 1-12, Oxford OUP 1984

    • Search Google Scholar
    • Export Citation
  • 188

    Wachtel Klaus Editing the Greek New Testament on the Threshold of the Twenty-First Century Literary and Linguistic Computing 15 1 2000 43 50

    • Search Google Scholar
    • Export Citation
  • 189

    Welsby Alison A Textual Study of Family 1 in the Gospel of John Berlin & Boston De Gruyter 2014 4 5

  • 190

    Westcott Brooke F. Hort F.J.A. The New Testament in the Original Greek. Introduction and Appendix Cambridge Macmillan 1881 31

  • 191

    Wiley David Noncommercial Isn't the Problem, ShareAlike Is Open Content 2007 https://opencontent.org/blog/archives/347

List of Internet Resources

Codex Bezae:

Codex Sinaiticus:

Creative Commons licences:

IGNTP transcriptions:

Museum of the Bible Greek Paul Project:


The HumaReC project on a trilingual New Testament manuscript:



For a description of how to make a paper collation, see Parker, David C., An Introduction to the New Testament Manuscripts and their Texts, Cambridge: CUP, 2008, 95-100.


Robinson, Peter, Collate: Interactive Collation of Large Textual Traditions, Version 2, Computer Program distributed by the Oxford University Centre for Humanities Computing: Oxford, 1994; see also Houghton, H.A.G., and Smith, Catherine J., “Digital Editing and the Greek New Testament,” in: Ancient Worlds in Digital Culture (Digital Biblical Studies 1), ed. Clivaz, Claire, Dilley, Paul, Hamidović, David, Leiden: Brill, 2016, 110-127; especially 118-120.


Houghton and Smith, “Digital Editing”, 115.


A description of how to make an electronic transcription is given in Parker, An Introduction, 100-106. Parker’s comment that “the transcription process is very different from collating” (104) refers to the incorporation of layout information, as explained below.


This high agreement between manuscripts and the majority text is the main reason why few resources have so far been devoted to the development of optical character recognition methods for reading New Testament manuscripts: the complex systems of abbreviation, the challenge of interpreting corrections, and the presence of paratextual material also present significant obstacles, especially in the majority of manuscripts written in minuscule script. Nevertheless, the large body of scholarly transcriptions of New Testament manuscripts created for the ECM would provide an excellent set of training data for those wishing to develop such a system, which could also be extended to Greek manuscripts more broadly.


In practice, however, variants are often overlooked by transcribers: for example, careful review of the eight places of variation between the Textus Receptus and the majority text of John led to the correction of many transcriptions. For Galatians, the IGNTP has experimented with using different base texts for the two initial transcriptions, but this has not yet been evaluated.


The practice of the INTF, however, is that lacunae in electronic transcriptions should be filled with the reading of the Nestle-Aland base text unless this is clearly wrong (INTF, Dokumentation der Funktionen des Transkription Editors und Richtlinien zur Transkriptionen neutestamentlicher Handschriften, Version 1, August 2013; see especially 19).


For an illustration of the practices adopted for the ECM, see INTF, Dokumentation, and the equivalent IGNTP document, Guidelines for the Transcription of Manuscripts Using the Online Transcription Editor (2016), available at <http://epapers.bham.ac.uk/2161/>.


Experience in reconciling transcriptions shows that even the recording of a single correction may often lead transcribers to overlook other textual variations on the same line. Similarly, initial transcriptions of commentary manuscripts are frequently less accurate due to transcribers having to count the number of lines between sections of biblical text.


Westcott, Brooke F., Hort, F.J.A., ed. The New Testament in the Original Greek. Introduction and Appendix, Cambridge: Macmillan, 1881, 31.


For more on this markup and its subsequent development, see Houghton, H.A.G., “The Electronic Scriptorium: Markup for New Testament Manuscripts,” in: Digital Humanities in Biblical, Early Jewish and Early Christian Studies, Clivaz, Claire, Gregory, Andrew, Hamidović, David, Leiden: Brill, 2014, 31-60, especially 33-35.


The Online Transcription Editor was produced by Martin Sievers and Gan Yu at the Trier Center for Digital Humanities, and has been integrated into the New Testament Virtual Manuscript Room and the Workspace for Collaborative Editing. For further information, see Houghton, H.A.G., Sievers, Martin, Smith, Catherine J., “The Workspace for Collaborative Editing.” in: Digital Humanities 2014 Conference Abstracts, EPFL-UNIL, Lausanne, Switzerland, 8-12 July 2014, 210-211 (online at <http://dharchive.org/paper/DH2014/Paper-224.xml>), and Houghton, “Electronic Scriptorium”, 36-37.


For example, the redeployment of transcriptions of Family 1 in John produced by Alison Welsby in the ECM of John: see further Welsby, Alison, A Textual Study of Family 1 in the Gospel of John, Berlin & Boston: De Gruyter, 2014, 4-5.


A good example of this is the transcriptions shared between the United Bible Societies’ Gospel according to John in the Byzantine Tradition and the IGNTP volume of The Majuscule Manuscripts of John (see further Parker, An Introduction, 220-221).


As in the case of the Digital Codex Sinaiticus (www.codexsinaiticus.org; see further Parker, David C., Textual Scholarship and the Making of the New Testament, Oxford: OUP, 2012, 115, which refers to transcriptions“which have been used on four different websites, each in a different format”) and the presentation of Codex Bezae in the Cambridge University Digital Library (<http://cudl.lib.cam.ac.uk/view/MS-NN-00002-00041/>).


As in the case of the Logos editions of Codex Bezae and Codex Sinaiticus (<https://www.logos.com/product/29619/codex-bezae-cantabrigiensis>; <https://www.logos.com/prod uct/35581/codex-sinaiticus>).


See, for example, Houghton, H.A.G., “The Gospel according to Mark in Two Latin Mixed-Text Manuscripts,” Revue Bénédictine 126.1, 2016, 16-58.


On transcription as an abstraction, see Parker, An Introduction, 104-105.


This is described in Parker, Textual Scholarship, 114-115, which also underlines the importance of workflow; see too Wachtel, Klaus, “Editing the Greek New Testament on the Threshold of the Twenty-First Century,” Literary and Linguistic Computing 15.1, 2000, 43-50, especially 47, and Müller, Darius, “Zur elektronischen Transkription von Apokalypsehandschriften: Bericht zum Arbeitsstand,” in: Studien zum Text der Apokalypse II (ANTF 50), ed. Sigismund, Markus, Müller, Darius, Berlin: De Gruyter, 2017, 19-30. In practice, with small project teams, it is often necessary for the reconciler to be one of the two initial transcribers.


For example, it was used by both Wilhelm Ott and Vinton Dearing in the 1960s (see Ott, Wilhelm, “Transcription and Correction of Texts on Paper Tape: Experiences in Preparing the Latin Bible Text for the Computer,” LASLA Revue 2 (1970) 51-66 and Gilbert, Penny, “Automatic Collation: A Technique for Medieval Texts,” Computers and the Humanities 7.3, 1973, 139-146). I am grateful to Catherine Smith for these references.


Fenton Eileen G., Duggan, Hoyt N., “Effective Methods of Producing Machine-Readable Text from Manuscript and Print Sources,” in: Electronic Textual Editing, ed.Burnard, Lou, O’Brian O’Keefe, Katherine, Unsworth, John, New York, MLAA, 2006, 241-253, quotation from 253.


Parker, An Introduction, 104. Elsewhere, Parker states that “the best way to achieve the greatest possible accuracy is by making two independent transcriptions, automatically generating a list of the differences, and then verifying the correct one.” Codex Sinaiticus: The Story of the World’s Oldest Bible, London: British Library, 2010, 177.


Hoskier, Herman C., The Text of Codex Usserianus 2. r2 (“Garland of Howth”). With Critical Notes to Supplement and Correct the Collation of the Late Thomas K. Abbott, London: Quaritch, 1919, iii.


For example, the Vetus Latina Iohannes edition identifies 29 textual inaccuracies in Tischendorf’s transcription of John in VL 2 and 37 textual inaccuracies in Buchanan’s transcription of John in VL 4, in addition to differences in format and punctuation; in contrast, there are only 6 textual errors noted in Vogels’ transcription of VL 6 (see the linked files on <http://www.iohannes.com/vetuslatina/manuscripts.htm>).


See further Houghton, H.A.G., Parker, David C., Robinson, Peter M., Wachtel, Klaus, “The Editio Critica Maior of the Greek New Testament: Twenty Years of Digital Collaboration,” Digital Philology (forthcoming). In addition, the Museum of the Bible Greek Paul Project trains students ab initio as part of an academic course (<http://ntvmr.uni-muenster.de/web/gsi-greek-paul-project>).


A spreadsheet prepared for the IGNTP in 2014 on the basis of previous work gave average rates of 600 words per hour for transcription and 750 words per hour for the tasks performed by the reconciler.


This may be illustrated by the fact that over half of the 254 Greek transcriptions prepared in conjunction with the ECM of John have been adjusted during subsequent work on the apparatus, even though few of these have involved a change to a reading: further details are available in the log of changes in the header to each of the files at <http://www.iohannes.com/transcriptions/>.


The XML files for this project are available at <http://www.epistulae.org/>, some of which include information about comparison with other editions. For instance, 8 textual errors in Tischendorf’s transcription of the Latin text of 2 Corinthians in VL 75 (Codex Claromontanus) are listed in the header of the file.


For the Piers Plowman Electronic Archive, see Fenton and Duggan, “Effective Methods”, 245-6. Robinson, Peter M., “The Collation and Textual Criticism of Icelandic Manuscripts. 1. Collation,” Literary and Linguistic Computing 4.2, 1989, 99-105 describes his transcription process as a single transcription which was checked “atleast three times” resulting in a maximum of eight errors per manuscript (an accuracy rate of 99.8%). The checkingwas assisted by including details of layout and a font which resembled that of the scribal hand.


The most extensive example of such a publication is Krans, Jan, “Codex Boreelianus (F 09) and the IGNTP Edition of John,” TC: A Journal of Biblical Textual Criticism 15, 2010, <http://rosetta.reltech.org/TC/v15/Krans2010.pdf>.


This was not the case with transcriptions produced for Collate, where transcriber notes were recorded in a separate file and indicated by pointers within the transcription (see Parker, An Introduction, 105). Although some scholars advocate “stand-off markup” in which the text is in one file and all metadata is in another, this requires a robust file management system to ensure that the two are always connected (see further Berrie, Phill et al., “Authenticating Electronic Editions,” in: Electronic Textual Editing, ed. Burnard, Lou et al., New York, 2006, 269-276). On a procedural level, it might be suggested that the model of stand-off markup fails both to appreciate the complex interplay of text, presentation and use in textual artefacts and to recognise that a transcription itself is a work of interpretation (as already observed above).


Examples of such appendices may be seen in Tischendorf’s transcription of Codex Claromontanus and Scrivener’s transcription of Codex Bezae: these are almost the printed equivalent of stand-off markup described in the previous footnote.


For more on this subject, see Durusau, Patrick, “Why and How to Document your Markup Choices,” in: Electronic Textual Editing, ed. Burnard, Lou et al., New York, 2006, 299-309.


See The TEI Consortium, TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 3.1.0, December 2016 (<http://www.tei-c.org/Guidelines/P5/>), specifically <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html>, accessed on 10.04.19.


The need for such documentation for digital scholarly editing projects was also set out by Alexander Czmiel in a paper entitled “Sustainable Publishing: Standardization Possibilities For Digital Scholarly Edition Technology” presented at the DIXIT conference in Cologne in March 2016: see <http://dixit.uni-koeln.de/convention-2-abstracts/#czmiel> (and also <http://dh2016.adho.org/abstracts/132>).


This is described in Houghton, “Electronic Scriptorium”, 39-41; for the latest version of the document, see <http://epapers.bham.ac.uk/1892/>. The subset of the TEI-P5 guidelines for transcribing New Testament manuscripts is set out in an ODD file created through the Roma tool, which is then used to generate RNG and XSD schemas for validation. It should be noted, however, that this customisation of the TEI only involves the selection of features, not the alteration of any elements or attributes.


This is exemplified by Krans, “Codex Boreelianus”.


For Diktyon numbers, see <http://pinakes.irht.cnrs.fr/>; the proposal for an ISMSN was put forward by the Biblissima project at a conference in Paris in April 2017. The current TEI header for the IGNTP includes a field for the identifiers in Trismegistos and the Leuven Database of Ancient Books.


See further <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD53>.


See The International Greek New Testament Project, The New Testament in Greek. The Gospel According to St Luke. Part One. Chapters 1-12, Oxford: OUP, 1984; Elliott, Bill W.J., Parker, David C., The New Testament In Greek IV. The Gospel According to St John. Volume 1. The Papyri, Leiden: Brill, 1995; Schmid, Ulrich B.,Elliott,Bill W.J., Parker, David C., The New Testament in Greek IV. The Gospel According to St John. Volume 2. The Majuscules, Leiden: Brill, 2007; in addition, the following statement is found on the project website: “Itis notIGNTP policy toattach names to individual transcriptions, since the editions are a collective effortworked on by anumber ofpeople.” (<http://www.iohannes.com/IGNTPtranscripts/transcribers.htm>)


This was first implemented for the transcriptions of Greek manuscripts of Galatians released in November 2017 at the website <http://www.epistulae.org>.


See, for example, Parker, An Introduction, 217.


See further <http://www.creativecommons.org>, and Parker, Textual Scholarship, 114-115.


See, for example, Robinson, Peter M., “Some Principles for Making Collaborative Scholarly Editions in Digital Form,” Digital Humanities Quarterly 11.2 2017, §§36-37 and <http://freedomdefined.org/Licenses/NC>.


See Robinson, “Some Principles”, note 20, who also refers to Wiley, David, “Noncommercial Isn’t the Problem, ShareAlike Is,” Open Content, July 2007, <http://opencontent.org/blog/archives/347>. The HumaReC project on a trilingual New Testament manuscript, presented at the same seminar as the present chapter, does not specify share-alike in the licence of its transcription, <https://humarec.org/>.


On Anastasia, see further Houghton, “Electronic Scriptorium”, 34-35. The download link for the Codex Sinaiticus transcription is <http://www.codexsinaiticus.org/en/project/transcription_download.aspx>, although this has not satisfied all users, some of whom complained that they could not copy and paste overlines from the website while others wanted the download to be in Microsoft Word format.


See <http://www.iohannes.com/transcriptions> and <http://www.epistulae.org>.


The interface may be seen at <http://ntvmr.uni-muenster.de/community/vmr/api/transcript/ get/>. For example, the XML transcription of Mark in Codex Alexandrinus may be retrieved at the following page: <http://ntvmr.uni-muenster.de/community/vmr/api/transcript/get/?docID=20002&biblicalContent=Mark&format=teiraw>.


The University of Birmingham Institutional Research Archive (UBIRA; <http://ubira.bham.ac.uk>), on which many IGNTP transcriptions have been deposited, indicates to users if an updated version of the file exists. This repository is also planning to assign digital object identifiers (DOIs) to electronic files with effect from late 2017, which would make it even easier to locate and cite each transcription.


One workround could be to use the numbering of lines on each page to avoid overlap, or duplicating the entire page in each file. Lectionaries and catena manuscripts, too, require special treatment for display, although they are arguably a production-unit in themselves.


One example of such a project is the Catalogue of Digital Editions maintained on GitHub by Greta Franzini since 2012 (<https://github.com/gfranzini/digEds_cat>). A presentation for a Society of Textual Scholarship workshop given in May 2017 by Hugh Cayless and Raffaele Viglianti, “Publishing Editions on GitHub Pages with the Text Encoding Initiative” can be downloaded from <https://go.umd.edu/STS-TEI>. The HumaReC project (see note 45 above) also uses GitHub.


Parker, David C., Textual Scholarship, 115.

Ancient Manuscripts in Digital Culture

Visualisation, Data Mining, Communication