The Scripta Qumranica Electronica project provides a web-based platform for exploring the Dead Sea Scrolls and creating critical digital editions of them.1 Among the many features this online virtual research environment provides is the ability to mark the region outline of letters on an image of a manuscript. This is accomplished by recording the exact vector coordinates – a series of x,y points – that encompass the ink of individual letters on the manuscript and linking it with its corresponding letter in a digital transcription of the text. The isolation of each letter and its linkage to a transcription is currently done manually within the platform, but automation of the process is underway.2 This linked data provides a large paleographical repository of information about extant letter forms and their relative placement which can be used to answer many different questions related to scribal habits and fragmentary manuscript remains. Relevant to the present book, this information can be used to automatically generate fonts matching the scribal practice in individual manuscripts.
The shape of a given letter that will be used to present it in the automated font is called “a glyph.” The automated process to produce such fonts involves analytical routines to determine typical values for: 1) glyph shape, 2) glyph size, 3) vertical glyph positioning, 4) kerning of glyphs, 5) word spacing, and 6) line spacing. All of these script features are subject to variation throughout the course of a manuscript, but a model font will aim toward a representation of the average and the prototypical.
1 Glyph Shape
The glyphs in a SQE paleographical repository (figure 26) will often include many damaged or otherwise incomplete forms. Nevertheless, the presence of these incomplete forms alongside intact ones presents little difficulty for algorithmic analysis so long as there is a sufficient number of examples (typically 5 or more good examples of each glyph). The prototypical shape for each character is determined using the Anna Karenina principle: all good character shapes are alike in the same way, each bad one is bad in its own way – an oversimplification from a paleographical perspective, but nevertheless serviceable.3
This approach is carried out by gathering all shape forms for a given letter in a manuscript. Each shape is geometrically simplified using the Douglas – Peucker algorithm to obtain a more basic shape (figure 27).4 This result is further processed by means of morphological thinning, which reduces the shape into a “skeleton” approximation of its component strokes (figure 28). In order to find the single glyph form that is most similar to all others, the mathematical distance between every possible pairing of the simplified, thinned glyph forms of a character is calculated using a set of values known as Hu moment invariants,5 which enable the comparison of images without being affected by their rotation or scale. A score is calculated for each glyph form by summing its distance to every other form. Badly damaged glyph forms will have a high score, since they tend to deviate from the normal glyph shapes in many different ways. The character glyph with the lowest aggregate distance to all other glyphs of that character is selected as the most “prototypical” form available in the set, being the most similar to all other forms.
2 Glyph Size
The scores gathered in the previous step can be used not only to find a single exemplary form, but also to define a grouping of good glyph forms by selecting every shape whose score falls below a specified distance threshold. Since these all represent acceptable alternative glyph shapes for that character, the form that is closest to the mean height/width size for the group can be selected as representing the optimal character glyph in terms of both its shape and its dimensions.
3 Vertical Glyph Position
When creating a font, it is also necessary to determine where each glyph sits on the base (or hanging) line of writing. An abstract base (or hanging) line is derived from a control character, which is either the character with the highest number of unique neighbor characters in the text or a character manually determined based upon specialist knowledge of the character set being analyzed. The topmost point of each glyph form of that character is treated as lying directly on an imaginary hanging line, regardless of its actual vertical positioning with respect to the hanging line. The vertical position of all other character forms is calculated based on their relative position on the vertical or y-axis in comparison with the glyph form of the control character that immediately neighbors it (figure 29). The y-offsets for all pairs of adjacent character glyphs are collected and then the absolute baseline offset for each character glyph is calculated based on the average value relative to the control character. For any characters that never occur directly adjacent to a glyph of the control character, the baseline (or hanging line) is set similarly in relation to any other characters for which a vertical position has already been established.
4 Glyph Kerning
The horizontal kerning is an essential part of every font design and may carry significant implications for the overall reconstruction of a manuscript, as described in chapter 10. The kerning for every possible pairing of characters is calculated by finding the distance from the tail edge of the first character glyph’s bounding box to the leading edge of the second character glyph’s bounding box along the base (or hanging) line of the text (figure 30). The average kerning value for each unique character pair is used in the font. When a character pairing cannot be found in the paleographical repository, the algorithm can resort to the global averages. It would also be possible to attain a more likely kerning value by manually or automatically defining classes of characters and borrowing values from extant pairings of glyphs belonging to the same respective classes.
5 Word and Line Spacing
Word spacing can be calculated by collecting the size of each space between words in the entire manuscript and then calculating the average. The vertical distance from the hanging line in one line of text to the next one can be calculated by measuring the distance between the hanging lines of each pair of sequential line of text in each column and averaging the result.
The caveat remains that for any given Dead Sea Scrolls manuscript the glyph sizing, shape, kerning, and word and line spacing vary, sometimes considerably. The reduction of that variation to singular values for usage in a digital font results in a necessarily imperfect model of the scribal artifact, which might only minimally reflect reality – or even not at all. Nevertheless, such fonts produce models that do have a use in some contexts and the process as outlined above should be applicable or adaptable to any script that is linear in sequence, regardless of directionality. The measure of approximation that can be achieved by such a font vis-à-vis the real manuscript is discussed in detail in chapter 10 and Appendix 1.
Brown-deVost, “Scripta Qumranica Electronica (2016–2021)”; Ratzon and Gayer, “Scripta Qumranica Electronica.”
See Daniel Stökl Ben Ezra, Bronson Brown-deVost, Nachum Dershowitz, Alexey Pechorin, Benjamin Kiessling, “Transcription Alignment for Highly Fragmentary Historical Manuscripts: The Dead Sea Scrolls,” International Conference on Frontiers in Handwriting Recognition (ICFHR) 2020, 361–66.
A more formal expression of this concept can be found in Vladimir I. Arnold, “Principle of the Fragility of Good Things,” in Catastrophe Theory, trans. G.S. Wassermann (Berlin/Heidelberg: Springer, 2004), 31–32. It has been applied specifically to image analysis by Arjan Kuijper and Luc M.J. Florack, “Using Catastrophe Theory to Derive Trees from Images,” Journal of Mathematical Imaging and Vision 23 (2005): 219–38.
David Douglas and Thomas Peucker, “Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or its Caricature,” The Canadian Cartographer 10.2 (1973): 112–22.
See Zhihu Huang and Jinsong Leng, “Analysis of Hu’s Moment Invariants on Image Scaling and Rotation,” in 2010 Proceedings of 2nd International Conference on Computer Engineering and Technology, ICCET V7 (Chengdu, 2010), 476–80, doi: 10.1109/ICCET.2010.5485542. Out of the various possible formulae, I found the following to produce the most satisfying results: