Owing to the mere fact of their accessibility, it has been established that digitizing little used, obscure materials can increase their usage. In the case of a collection of American literature from the nineteenth century at Cornell, the increase of use was so dramatic that it was concluded that “In hard copy the material may have seemed obscure; when digitized it becomes a core resource.”1 Manuscript collections only benefit more from this effect, as their items are unique and only to be handled under strict regulations. Then again, mass digitization will only make a real impact if it goes along with mass metadata, a digital catalog of some sort, with information like title, author, dating, and provenance.2 In this sense, creating digital catalogs should be a priority. For the most part, this is the domain of librarians and curators, who are experts at this just as much as they know far better how to do the digitization than we, scholars, do. But just as we digitize on a small scale ourselves, when working on a research project, we will often find ourselves in a situation where it would be beneficial to create a small catalog for our private research needs. It should be noted that ‘catalog’ here can refer to anything as long as it is a collection of items described under certain terms. So, just as we can catalog books, we can also catalog illuminations, glyphs, or abbreviations. In this chapter, I present a case study to explore how we can do field work digitally and transfer the labor of that fieldwork into catalog data and then how, from that catalog data, get to a digital catalog accessible online, much like the online catalog developed by a team at the University of Trier of the collection of the German monastery of St. Matthias, but then developed by ourselves and without a budget. Much of this work rests on web development technology, which is an especially potent part of computer technology for many parts of our work in the humanities.3 Importantly, web development is very easy to learn since there are a plethora of resources freely available. At the same time, one significant downside to web development is that it is undergoing an extraordinary evolution, with new major innovations pushed out almost every year as new standard ways of working. Therefore, in this chapter, we shall focus on the fundamentals in the understanding that with this knowledge, you will be able to go out and adopt any new technology that might benefit you.
1 Field Research Workflow: From a Dusty Backroom to My Computer
In this section, I shall describe the workflow I settled on when I worked on cataloging a small collection of books, articles, and manuscripts. These artifacts are kept at Sankt Florian, a monastery near Linz, Austria. Sankt Florian, in its current form, was built in the 17th century in a baroque style. It is basically a monastery and palace in one. One part of it was for the Augustinian Canon Regulars to live and pray, while the other for the Habsburg Monarchy to stay for the night and conduct business. St. Florian is a center of music with a world-famous organ, but also boasts a very large collection of books, part of which are kept in a dramatic baroque gallery. On their website, it says that:
In 1930, the Monastery library bought the literary remains of Rudolf Geyer (1861–1929), a Viennese orientalist. Still 20 years later, this collection was considered the most comprehensive one in Arabic literature between Berlin and Rome. Meanwhile, about a third of Geyer’s books is indexed.
Naturally, I was intrigued. I wondered exactly how big a ‘comprehensive’ collection it would be, and I figured that given the life years of Rudolf Geyer, there must be rare or otherwise valuable books from the 19th century in this collection. After inquiring, the ‘index’ of one-third of the collection turned out to be a handwritten list of title, author, place, and year, only to be consulted in the library itself. Further details about the collection could not be given. An on-site visit was unavoidable, which I did in the summer of 2016. Arriving late on Monday and leaving early on Friday, I only had three full days to conduct a preliminary investigation. As a testament to the power of a digital work environment, although the catalog was only completed two years later, it was all based on my fieldwork of just those three days.
Upon arrival, I came to know that the Geyer collection was not in the beautiful gallery part of the library, but in a dusty back room. It covered perhaps about twelve bookcases, each with seven shelves, many of them containing double rows of books. Inspecting the hand list that had been drawn up, I noticed that the previous cataloger had looked at certain sections that contained only European language resources, of which most of them were articles, individually bound and shelved. Browsing for half an hour made it clear that the majority were books. There were also a lot of articles as offprints, and a dozen or two manuscripts. In short, anything from a book review to a multi-volume primary source could be an item, with each one having a seal, like a post stamp with an index number. Interestingly, inside virtually all items, Geyer placed an Ex Libris sticker, which also shows a number, different from the one on the seal; thus, apparently, there are two numbering systems. In cataloging the collection, I only considered the index number on the seal, since this is on the outside of the item and is, therefore, easier to inspect while browsing the shelves. Next to this collection were boxes with notes, drafts, letters, and other things belonging to Rudolf Geyer. I did not investigate them in any detail, focusing on the proper collection.
Sitting down for each item of the collection and noting the catalog details would have been too time consuming. Even if I had much more time at the monastery, it seemed like an ineffective workflow. Given the size of the collection (about 1500 items) and the time left, I decided I could make a photographic index of all title pages, or at least of those items not mentioned in the hand list. I used my phone for this, an iPhone 6, which facilitated my process. Even while holding the phone in my hand, I could use both hands to pick up items and flip them open. Keeping them open with one hand, I snapped a picture with the other. For lower shelves, I used a trolley to load part of a shelf onto it, returning the item after taking a picture. This was also necessary to reach the second back row. On the top shelves, which had a single row, I simply stood on a ladder, using its top surface as a small table to keep the items straight while photographing them.
I did not use the stock camera application but Evernote.4 Evernote is a simple note-taking application. It is essentially a database for notes of all kinds, typed text, drawn handwriting, audio, images, and PDFs, with a user-friendly interface built around it that is so polished and easy to use you rarely think of it as a database, but simply as a note-taking app. With an eye towards possibly reusing your current labors, I would recommend using such an application. It keeps all your notes together and stores them in a way that will be accessible and exportable for the long-term foreseeable future. Notes are generally inserted in different notebooks, but with one search, one can find notes across all notebooks. It also ensures an offsite backup, as everything is stored on Evernote’s servers too. In the case of cataloging Geyer’s collection, it allowed me to store photos of each item in separate notes, so that later on I would never have to doubt whether a photo belongs to one or the other book. Also, if I want to export the photos to make them ready for another application, a rudimentary division is already baked in. If I ever wanted to bring all the photos together, it would be possible too. Originally, I wanted to give each note the title of the item number as it is written on the post stamp-like seal. After only a few items, I realized that typing them out was taking up too much time. Instead, I opted to simply make another photo, this time of the seal. In certain cases, for example when there were multiple title pages in different languages, or when there were multiple volumes of one title, or when parts of the catalog information was not on the title page but on the last page, I snapped additional pictures. In total, I took probably around 2,500–3,000 photos. If we assume three full days of eight hours work, that would come down to half a minute per photo, or about a minute per item. According to my notes in Evernote and comparing their time of creation, I did indeed spend a minute—sometimes even less—per item.
However, snapping so many photos in Evernote came with a cost. To upload everything to Evernote’s servers took over a week. Every photo taken in Evernote is also saved in your Photos app. Since all applications on an iPhone are sandboxed, the photos are physically stored twice on my phone. The upside of having the photos also in the Photos app is that I was able to quickly put them on my laptop. I only had to connect my phone, and the Photos app on my Macbook appeared, allowing me to select and download the images. The downside is that, after, having the photos twice on my phone seemed redundant, but trying to delete the photos from my Photos app was surprisingly difficult. Apparently, deleting two to three-thousand photos at once from Photos is an incomprehensible task for the phone. I tried it several times but failed. I also tried to do it for each day, basically in batches of about eight-hundred, but that too resulted in nothing. The iPhone simply remained unresponsive and did not delete anything. I ended up deleting them in a hundred or so batches of twenty photos. It is entirely possible that such issues are resolved in the future, but undoubtedly, you will encounter other but similarly odd behavior. It seems, then, that we are stretching the capabilities of consumer electronics and stock apps. At the same time, it is pleasant to notice we can actually get by with these simple tools, and we do not need to acquire more professional hardware or software to do our job.
By then, I had the title pages of the Geyer collection right in my pocket, stored in Evernote. Considering the number of notes I ended up with, together with the written hand list, I estimated that the total number of items amounted to no more than 1,500. The next step was to extract the different elements of the title page (title, author, publisher, etc.) into plain text, collected in such a way as to be able to be constructed into a catalog. I considered if I could make the jump from images straight to professional, library-quality cataloging. This, however, was unnecessary. Libraries use database systems that require vast amounts of entries, constant updating, and write-access for multiple users. None of that applied to the Geyer collection. All I needed to end up with was a machine-readable list of all the items with their details so that we can then reuse the list in different ways; either to create a printed catalog or an interactive, online catalog or to load it into a bigger catalog. This list would contain only a limited number of entries, at least from a computer processing point of view. Furthermore, the list would require little to no update afterward so the ability to edit entries did not need to be fast and user-friendly. Lastly, I knew I was the only one who was going to put in hours of work into this, so there was no need to allow multiple users.
To reduce the hours of work needed and make the actual work as painless as possible, I considered making a custom application in FileMaker. This is a software with which you can create simple relational databases. Using its drag-and-drop elements, you can create forms to either display or enter records. A relational database is best visualized as several tables held together by relations. In each table, a row indicates a record, representing a unique object that is described by the values written in that row in several columns. For example, a table persons can have columns such as first name, last name, and age. Each row then represents a person, described by their name and age. This person is unique: it is only defined once in the table. It should be noted that a table is only a visualization. When we look at tables, there seems to be a specific order, with a top and a bottom for the rows and a left and a right for the columns. For databases, this kind of order is not actually there: all records are stored as though they are marbles in a bag in which you put your hand to reach for them blindly.
In this example, if you also wish to include a column books, to describe all the books the person wrote, you will encounter a problem. For some people, there will be no books; for some, one title; for others, multiple. We would ideally have a dynamic number of columns to fill the number of books per person. Let us assume for argument’s sake that each book has only one author. Then, a better way to write this down is to open a new table called books, listing each unique book as a row with columns for title and author. In the author field, we only need to put a referral to the correct row in the persons table. Such a referral is called a key, a foreign key to be exact since the key belongs to an entry that is foreign to the books table. For each table, an extra column may be created to store a unique ID. This is also called a key, but now a primary key.
Storing information like this, in a relational database, has proven to be extremely useful in the digital world. Information can quickly be obtained, and very few pieces of information, if any, needs to be stored twice. This is not only convenient in terms of file size, but it also means that if information needs to be updated, you only need to change it in one place and, based on that edited record, it is updated everywhere else. FileMaker allows you to point and click your way through setting up such a database, and by making attractive forms, filling that database with information can be both quick and somewhat fun. A big advantage of FileMaker is that you can create forms that work on iPhones and iPads. This allows for entering information on these handheld, touch screen devices, which then sync back to the main database on a computer.
What I had in mind doing was to load all the Evernote photos automatically in FileMaker, and then create different forms that each would only add a small piece of information. To start with, I wanted FileMaker to present me with a photo of a seal, and a small text box to type the index number visible on the seal. This would have been a task that I could do on my phone while waiting or traveling, and by accumulating all these small moments, I would have entered all the index numbers without truly having lost time over it. With the title pages, I wanted to do something similar, but with a twist. The first step was to get a photo of a title page on my screen, and then I would have to press and drag to create a rectangle around the title. After releasing, the photo would stay visible for another two seconds to allow the user to cancel; otherwise, the area of the rectangle would be stored as the area where the title is, and a new photo would instantly appear. Thus, it would be a matter of endless drawing of rectangles around titles. After that is done, a second step would be activated, in which only that part of the images as defined by the rectangles would appear on the screen, along with a text box, to type out the title. A similar strategy would be applied for each element on the title page. By chopping up the work into these small, menial tasks, I intended to catalog the collection in small, spare moments. The only problem was that to build that functionality in FileMaker would take a serious amount of time. Although I did have experience with FileMaker, I did not find the time to sit down and make it. I still think it is a good way to go about processing fieldwork, but it perhaps becomes more sensible when the corpus is bigger than a mere 1,500 entries and when there is a more immediate reason to get it done.
Instead, I fell back on a piece of software I had already been using for other parts of my research: Zotero. Zotero is a citation (or reference) manager, similar to EndNote and Mendeley, which syncs your references to its server. Zotero already provides that user-friendly interface I needed to type out the different details of the title pages. It has a function to export all the entries to a machine-readable format such as XML or JSON (more on this later). It does not work on phones or tablets since the interface of Zotero is designed to be used with a keyboard. After all, cataloging is generally a keyboard-reliant activity. I figured I could diminish that in FileMaker by creating custom inputs that would only require one or a few taps on the screen. Since Zotero or its third-party apps are not customizable, I had to change my workflow around the philosophy that cataloging should be done with a keyboard and in one go, collecting all metadata of one item before moving on to the next.
As a first step, I went through all the photos in Evernote where I had stored them, hand-typing the index number in the title of each note. This was a fairly painless job, since activating a note would instantly display the photo of the seal. This was more luck than wisdom, for if the photos of the seals came after the photos of the title pages, I would have had to scroll down on each note to reveal the index number. I figured it was worth the trouble of typing the index number in Evernote so that I could order the notes more easily and keep track of all of them better as I moved through the process of cataloging.
In the fall of 2017, I settled into a rhythm of working a little each night. Over two months, after some fifty hours,5 I had pretty much completed typing out the catalog details. For some quick math, these fifty hours can fit neatly into two months if we assume an hour of work each day, spending about 2 minutes per item. Both estimates seem like a reasonable time. Little can be said about the use of Zotero since it is self-explanatory. The only odd thing is that the field to enter the location in the archive or the field to enter the call number is very far down. To reach them, I would have had to hit the Tab key a lot of times, which is both prone to mistakes and time consuming. Therefore, I ended up using the Language field to enter the index number, which is usually only two tabs down from the field for Year.
To understand either part, CSL or JSON, let us first introduce a third term, XML, which stands for Extensible Markup Language. An XML-file is like a plain text “.txt” file, which you can open with any text editor. However, you are not supposed to simply type whatever you want, but you need to enter your information in a specific way for it to be a valid XML-file. This is because while an XML-file is easy to read for us human beings, by nature of the regular patterns of the specific way XML-files are supposed to be written, computers can interpret them easily too. This specific way is rather simple: every piece of information should be surrounded by tags. For example:
The word ‘example’ is called the tag, and it is written between the angular brackets so that a computer can know this is the tag. As soon as a computer sees an < and an >, it will remember the word in between and look for the same word but this time in between </ and >. The slash, then, indicates a closing tag. Anything in between the opening and closing tag is the information related to the tag. Tags can exist within tags. For example, a description of a book can look like the following:
This format is probably quite easy to read for a person, although it is not a very attractive format. We can write code to have a computer go through it and place all the different elements in the right order using the right styling. For example, it can be printed to the screen as “Lit, L.W.C. van, Among Digitized Manuscripts, Leiden: Brill (2020).” The order, the addition of spaces, commas, and other punctuation marks, and the italics, are all done automatically—a great relief if you need to do this for hundreds of records or if you wish to change the styling later on.
XML does not impose any restrictions on what the tags should be. As long as all the tags close, it is valid. Whether the first tag reads book or publication, whether the nested tag in lastName reads surNameProper or lastNameProper, that is up to us. This makes XML usable for basically any situation in which some ordered data needs to be stored for computer manipulation. The drawback is, of course, that if I use book and you use publication, then a computer will not recognize them as the same. So, if somebody writes code that instructs the computer to take all the elements called book, it will not do anything if you have prepared a file where all these elements are tagged as publication. A standard that is accepted by everyone is needed, with rules that we all abide by, so that we can rely on the regularity of the rules to automatically extract and manipulate information from the XML-file.
Citation Style Language is such a standard. It is devised by companies behind three applications to manage references, Zotero, Papers, and Mendeley. CSL provides a way to combine all the different fields we used in Zotero into a list that can be read by any software that also uses CSL. For example, it will be very easy to export from Zotero and import into Mendeley.
What Zotero produces is, actually, not an XML-file, but a JSON-file. Opening a JSON-file in a text editor will demonstrate its similarity to XML. It has tags, that can be nested, containing information. The difference is that JSON does not need a closing tag and uses a shorter punctuation, making a JSON-file much smaller. Let us consider the above example, this time in JSON format:
2 Web Development: From My Computer to the World Wide Web
For students and scholars of the humanities, since more and more of our fieldwork will take place in this sphere, proficiency in web development is a desirable skill to have. Whether we want to use manuscripts or a catalog that a library makes available online, or whether we want to use Google Books, Facebook, Twitter, or Wikipedia to scrape information in order to map out discussions that take place on the internet, since these resources are built on web technologies, we also have to investigate them using web technologies. The digital world, after all, is for a rather large part built on web development technology.
The good news is that of all popular technologies, web development is among the easiest to learn, simply by the mere fact of the vast amount of teaching resources available. You can get very far, in fact, without paying a penny. However, the bad news is that web technologies are, at the time of writing, rapidly developing and changing, indicating that learning should take a two-pronged strategy. There are the basics that one simply has to know and will likely remain useful for many years to come, and there are the actual, fully-developed technologies that should be seen as electives and should be learned on a just-in-time basis.
For the technology of the catalog, I first considered some off-the-shelf products like DataTables and Omeka, which only require you to input raw data. I concluded that they had added unneeded features but lacked certain aspects that were an absolute requirement for our case. Rather than trying to hack them into the shape I wanted them to be, I decided to build it up from the ground, so that it would be exactly custom-fitted for our situation.
Because others have explained web technology much better and more detailed than I can, and because a lot of it undergoes rapid changes, I shall not go over the code in detail. Instead, I will give a more conceptual overview of how and why I put together the code that I used.
First of all: what does web development mean? This term encompasses all the technology required to develop things that are transmitted and received over a network. Usually, this network is the world wide web, and these things are websites. As the complexity of a website grows and offers more functionality to a user, we can better speak of a web application. For example, our catalog has all the functions of a catalog but is wrapped as a website, and users will have to reach it through a web browser.
What we want to develop, then, is a website that loads the JSON-file with all the catalog entries and displays them. We further want a simple search and a simple sort function, and we like the interface to be bilingual. It helps to sketch the layout of the website to understand what is needed.
It is useful to divide this type of development into four parts: one part of our code will govern the structure of our website, another will provide the content that goes into that structure, another piece of code will ensure how that content will be styled, and a final part will provide interactivity, not only between the user and the interface but also within the website itself. This division makes for flexible working. The structural level can ensure that there will be a Heading 1 title here or there, while the styling level can ensure that all those Heading 1’s get, for example, a much bigger font size than regular text. On the level of content, we can define an English or German sentence for that heading while the interactivity can either program a button for the user to change the language or leave that functionality out and decide automatically which language to display.
3 Structure: HTML
HyperText Markup Language was originally conceived to fulfill all the four roles that I just separated. It is the foundational language with which to create webpages that a browser, the application by which you surf the internet, can read and display correctly. For example, if some text is placed within b-tags, you would not see in the web browser <b>some text</b>, but you would see some text: the browser reads the b-tags and knows that that is an instruction to display the text in between in bold. Your web browser, then, knows how to read and display an HTML-file, but would not know what to do with any of the other files. So, in this sense, the HTML-file is the gateway to the rest of your online product (whether it be a catalog or something else). In fact, currently, the only thing we need is an index.html, which is traditionally the name of a website’s landing page.
If you look over the code of the HTML-file, you will notice that it basically serves two functions. First, the head-tag is a shell that opens all the other JSON, CSS, and JS files. Second, within the body-tag, the HTML-file dictates a structure of where the different elements of our catalog should go.
As you may notice, all the images are SVG-images. These are images that do not store color values pixel per pixel, but rather store the coordinates of shapes and their colors in an XML-like manner, which browsers nowadays know how to turn into images. They are essentially connect-the-dots puzzles that the computer of the user will solve on the spot. This means that these images are vector-based and will look sharp no matter how small or big you make them. It also means you can edit them in the text editor where you edit your code (see the Productivity section below), for example, to change the aspect ratio or the color.
Below the welcome text, a text box and a button needs to appear to give the user the opportunity to search. Under the search function, a horizontal line should be drawn, indicated by <hr>, to separate the interface from the actual catalog.
At the very top of the catalog, a title should appear. As we will see, this title changes frequently, for example, notifying the user of the number of search results. Next to the title, preferably on the same line, three buttons should appear on the far-right to sort the displayed entries (the entire catalog or just the search results) according to the title, author, or year. I did not deem necessary anything more than this. If a place name or something else is important, one can simply search for it.
4 Content: JSON
If you have gone through my repository, you will have noticed that there are two JSON-files. In this case, they actually end in the extension .JS, as will be explained later on. One handles the interface (for both languages), the other contains the catalog. I structured the interface-file in a simple manner. At the top level, I defined the keys (the string on the left of the colon) as the same as the ID that the elements in the HTML-file have and to which the different texts belong. From a human-readability point of view, this should make it quite clear where each of these texts goes in the final website. In terms of the vertical structure, the order in which the keys are defined, it seemed to make the most sense to be as faithful to the structure of the website. Thus, header texts are first defined, and the texts for the modal, the pop-up showing the full record details, appear last.
Each of these key:value pairs contain another object as the value, with the first key being ‘english’ and the second being ‘german.’ Their values are semantically the same, but they store the required text in two languages. This structure will allow for easy deleting, swapping, or adding of languages. If we would want to add an interface in Arabic, we would only have to go through this JSON-file and write a comma after each ‘german’ value, hit Enter, type “arabic”: followed by the text in Arabic in between quotation marks. This is, I think, a better structure than, for example, having at the top level the keys of each language followed by an object containing all the interface elements, because if we would want to adapt some part of the interface, for example, the text of the introduction, we would have to scroll to different places in this file to change the text for each language. In this case, we only need to go to one place and instantly see the same text in different languages, making it easier to accurately change the text for each language.
It seemed a good idea to use the JSON-file created by Zotero with the smallest amount of tinkering so that we could create a different catalog in a pinch by exporting a different set of entries from Zotero. There are, unfortunately, some drawbacks to this, owing to the particular structure of JSON-files created by Zotero. The chief drawbacks are elements like id and event-place, which are pretty much useless for our purpose, but it may also be noted that these files have an overly complicated way of storing names and dates. Once again, we need to reassess the possibilities of the technology to find our best strategy to deal with these drawbacks. One solution is to fiddle around with the JSON-file that Zotero produced and shape it into a much simpler form, a form exactly as we want it. We can do this by using the search and replace function, with regular expressions to capture the part in every entry that we want to be deleted or changed.
5 Style: CSS
CSS stands for Cascading Style Sheets. It is one or more sheets (files) that end in .css and which should be called at the beginning in the HTML-file with a <link rel=“stylesheet” href=“pathAndNameOfStylesheet.css”> command. Such a CSS-file is merely a list of instructions regarding how particular elements should look like. Proper styling is especially necessary as there is an ever larger variety of screen sizes and browsers that users of your product will use. Taking this into account is using what is called ‘responsive web design,’ which will ensure that your website will not shrivel to an unreadably small size because someone uses a four-year-old smartphone, but instead, it will respond by reshuffling and resizing the elements and the texts in them. Styling is also necessary to make the website actually attractive. A catalog entry stored in a JSON-file is sort of readable by an individual, but once it is presented in a table with ample white space around it, lined out neatly, with good background colors and a font and font style that fit the context, it will be much more readable and enjoyable. In fact, as more and more websites do look good, users are simply starting to expect a certain level of sharp and attractive design.
Whereas the structure of our website in HTML really is up to us, for CSS, we can and should use already existing resources that are capable of instantly making our site reasonably usable and attractive. Styling controls the look and feel of every aspect of the website, so starting from scratch would be a tedious job, especially since many things like how a button looks and behaves should be sort of similar regardless of the online resource you are building. The responsive part of CSS, too, is equal no matter which project you are working on. Two templates that are currently popular are Bootstrap and Materialize. The first was created internally at Twitter and subsequently released as open source. The second was developed later by students of Carnegie Mellon University. I have used Bootstrap for my catalog. The changes that I wished to make have been put in an additional CSS-sheet, and I have instructed the HTML-file to load this additional file after loading the Bootstrap CSS-file. The ‘cascading’ in Cascading Style Sheets means that if there are two rules about the same thing, it is the last rule that is actually obeyed. Thus, by loading a custom made CSS-file after Bootstrap, we can use Bootstrap as a foundation and make some over-ruling changes to it, defined in the other file. Knowing what exactly can be controlled and which commands to use is a matter of searching the internet, specifically websites like StackOverflow and the documentation for CSS and for Bootstrap. For beginners, it will be beneficial to watch a video or attend a workshop. In most cases, it is only when the need arises that you should look into exactly how to do something.
Functions four and five form the core of this code, and to understand the entirety of it, it will be better to start there. The task of function four is to fill the interface with texts. Moreover, when the flag icon at the top-right of the website is clicked, it needs to change the texts to the next language. To keep the structure and content of the website strictly separated, I only declared an empty structure in the HTML-file, which means that when the user visits the website, this function needs to be called to populate all the fields. The function does not make many assumptions. For example, it does not specify the different fields by name but loads the name of those fields dynamically from the interface-JSON. This means we could add or delete certain interface elements by changing the HTML-file and the JSON-file, while leaving this code intact. Similarly, the number or kind of languages there are is not specified. Adding another language to the interface-JSON will be of no problem as long as one does not forget to also create a flag icon in SVG-format for that language. The language the interface is in right now is given to the function, and from there, the next language is established, both the index of the next language and the name. This index is used to access the correct text for all the fields, which is done by a simple for-loop, going through all the available elements. Within this loop, some specificity was inescapable, as accessing the text property of all fields could not be done with one command. Some require the command placeholder, others innerHTML, and others title.
Function five, Render_Table, creates the catalog underneath the interface. It takes a couple of parameters. First, the catalog to be rendered, in JSON format. Then a specification regarding the type of heading the catalog should have. If you look into the interface-JSON, you will see several different headings such as beginTitleCatalog and noResultsTitleCatalog. This function takes one of them to give a more specific feel to the rendered catalog. Next, the function takes in the language. Lastly, it takes the number of entries to be rendered, which could have been calculated within the function, but it seemed more readable to parse it into the function. The logic of the function is rather simple: it will create a variable containing a string that represents the HTML-code to display the catalog, for which the table-tag is used, and once it has filled that variable with all entries, it will fill the div in the HTML-file called catalogGeyer with that variable. The last thing it does is set the heading for the catalog.
Print_Names, function number eight, takes a reference to a catalog entry and returns the names associated with that entry with appropriate formatting. The first thing the function does is to ensure the existence of names. If that is the case, a for-loop iterates over all the names and for each name checks if there is a first name, a last name, and a connecting particle (such as the German ‘von’ or the French ‘de’). Since Zotero stores these values in separate fields, we need to stitch them together. All names are separated by a comma (and a space, of course), and the last name is closed with a period.
With the function Simplify_Term discussed, we can look at function two, Reducing_Catalogue. After experimentation, I found out that the search function probably worked best if we created another version of the catalog that would only contain the entries in reduced form. catData is the object that contains the catalog, and searchData will now become the object with the reduced catalog, what I call a shadow catalog. At first, I used the simple commands .values, which would give you all the values of an entry, such as title, place, and year; and I used .join, which takes all those values and stores them together in one long string. Elegant as this solution was, it was not performing well because the date and names are stored in a more complicated form than this approach allows. I finally settled on filling a variable with individual values of an entry, making profitable use of the function Print_Names. Since I did not perform if-then checks to see if a particular value even exists, I got many ‘undefined’ in my string, which can be deleted at the end, after which the Simplify_Term function reduces the entire string to complete the entry for the shadow catalog.
We now arrive at function nine, Search_Catalog. Obviously, this function requires a string as its input, the search term, which it normally gets from the text box with the ID searchTip. The function is triggered by the user, who can click the search button, or hit the Enter key when the focus is on the text box. This last bit is established with the method addEventListener, which is a little function that keeps its ears to a specific type of event, and when it hears it go off, it will perform some code. In this case, this code is triggered every time the user releases a key when typing in the text box of the search bar. When the last key was Enter, the listener will engage the Search_Catalog-function. Additionally, I provided instructions that if the user has deleted all the text from the text box, the entire catalog should be rendered again. I experimented with letting the Search_Catalog-function fire off every time the user pressed a key to get the experience of a live update, so that with every additional letter the user types, the number of entries is restricted. With a catalog of over 1,500 entries, this proved to be too heavy on the calculation side; the code could not be executed fast enough to get a snappy feel. In its current state, there is still not an instant experience of the result, but since the user needs to press the Search button or hit Enter, there is a greater expectancy on the user’s side that it can take a fraction of a second for the catalog to update.8 The search function itself is fairly straightforward. First, it reduces the search term the user provided by means of the Simplify_Term-function. Then, it runs through the shadow catalog and sees if the search term is contained in it. If so, that specific entry will be pushed from the catalog (catData) into a new object called searchData. This way, we can build a subset of the catalog containing only those entries in which the search term is present. Perhaps you wondered why Render_Table took as one of its parameters something called data. Now you can see that we can give Render_Table either the entire catalog (catData) or only a subset related to a search term (searchData).
The functions ten, eleven, and twelve are fairly similar. I created them by looking up examples of how to arrange objects alphabetically, adapting them to my own situation. For example, I could not be assured that every item had a title, so I built a check for that. Had I not done that, not all pairs would be compared and sorted, leaving the entries without a title scattered throughout the order of the other entries. Now, with this extra check, all entries without a title are sorted on top. When I discovered this flaw, I not only fixed the code but also decided to return to Zotero and try to give every entry a title, even if there was nothing on the cover. This interplay between coding the catalog and improving the catalog data is to be expected and might occur several times while setting up the catalog. At other times, you will run into issues for which cleaning up or improving the data will not help. For example, the sorting by author name is inherently messy, as multiple persons can be assigned to one item. In such cases, you can, at first, opt to make up the rules that entries should first be sorted by the original author; then, if there is none by the editor, and then, if there is not an editor, a translator.
As a last note, you may be wondering what the code in the first function does. This controls a fancy tool tip functionality when you hover with your mouse over a button or an element. Including this is as easy as including scripts from Popper.js and TippyTip. This, of course, adds some file size to the website when somebody uses it, but in this case, I did not see an issue with that. The only thing that needed manual instruction is when somebody looks at the catalog from a touchscreen device. Since you cannot hover your mouse on a touchscreen device, TippyTip did not respond correctly, so it is better to simply disable the functionality for tablets and smartphones.
Much more functionality could be built. For example, right now, the sorting is very simple. We could imagine that the same button could sort in two directions (A-Z, and Z-A), or that when sorting by year, given an equal year number, the entries are sorted alphabetical by title. Perhaps performance could be optimized to run all of this faster and smoother. This is always project-dependent. Given the relatively small scale of the catalog, I decided that the functionality as it is is enough and adequate.
7 Productivity: Code Editor and Code Repository
The most important aspect of productivity is making sure you write your code with a good editor. Writing any of these files, HTML, CSS, JSON, or JS could be done in a text editor as simple as Notepad (Windows) or TextEdit (MacOS)—but that would be madness. I wrote my code for the catalog in Visual Studio Code, which is made by Microsoft and free to use. There are other fine applications out there, and in the future, no doubt, will there be new contenders that could make life even more easy. When in doubt, take the seemingly more popular choice. Several reasons make VS Code great. First of all, in VS Code, you do not just open one file, but rather a project, represented by a folder on your computer. All the files in the folder are shown in a list in VS Code, and by clicking on each file, you open that file in a tab. This makes it incredibly easy to code in HTML, CSS, JSON, and JS simultaneously. VS Code displays your code in a color scheme, using one color for each type of thing (a method, attribute, or variable) and setting the background color to something other than white. I prefer a very dark grey as a background color, which is easy on the eyes, especially at night. With one key combination, Shift+Alt+f, the program will reformat your code to make it look neat and tidy again, indenting lines of code to show it belongs to a function, if-statement or for-loop. You can shrink or expand parts of the code; for example, if you write a long function, you can minimize it to its first line only so that you keep an overview of the surrounding code. Speaking of overview, on the right, there is a vertical visualization of the totality of your code, which you can use to browse through it. The real power of a program like VS Code comes in the assistance it gives in writing code. For example, in an HTML-file, only typing the exclamation mark and hitting enter will give you a boilerplate HTML-file, which includes a declaration of the file being HTML, a header with some information pre-filled, and body-tags. Writing a period and a word and then hitting enter will create div-tags with that word as a class, and doing the same with a pound-sign, #, will make a division with the word as its ID, and so forth.9
Extensions can be downloaded for the VS Code, which expands its functionality. For example, I use one called Live Server. When you open your index.html and click on Go Live in the bottom bar, your browser will be opened with the website you are creating. Every time you adjust any of the files in the project, the webpage reloads in the browser to give you an automatic update of what the website looks like and how it behaves under the adjusted code. This works even better when you have a second screen to place that browser window in so that on one screen you code, and the other you see the result.10 Such extensions and other advanced features like debugging make VS Code a really great choice. It is also easy to pick up, but as you progress and become more skilled with coding, you do not need to switch to more professional software: VS Code is a popular choice among many professionals.
The code you write yourself also needs to be controlled, of course, for which currently one of the best options is the software Git, especially as it is offered through the free service GitHub. With GitHub, it should be noted, you do not only get version control but also a reliable way to distribute and collaborate on code. A free account on GitHub will only allow you to make public repositories, so be careful not to use it if you wish to keep your code to yourself. The upside is that with GitHub you also get free hosting and, by adjusting a few settings, your code is not just readable as code in the repository but can be seen working as the website it is supposed to be. This is convenient to demonstrate the project to others and get a reliable sense of how it performs online. You will do well to spend half an hour understanding the philosophy and mechanics behind Git and Github. VS Code supports Git right in the editor, allowing you to see at a glance how you have changed your code since the last save in the repository (which Git calls a ‘push’).
A note of caution needs to be made about GitHub and, in fact, about all software I refer to: at this moment of writing, they perform really well and are free to use, but both aspects could change over time. GitHub is a particularly good example of this, as its free use and absence of advertisements give off the impression of being part of a public space. The public space on the internet is, in fact, really small. Wikipedia and The Internet Archive are notable examples of stable, future-proof, non-profit organizations committed to keeping their websites open for free, but GitHub is in fact owned by a for-profit company. If it is bought up or pressured by shareholders, it could very well change its services. We have seen such a change of course with Academia.edu. This website purported to create a public sphere for scientists and scholars but shifted in one year, 2016, towards an aggressive strategy to shake money out of their users.11 With whatever we do, it seems to me to be best to think in terms of abstract ideals and needs and consider which actual technical tool suits you best, keeping an especially close eye to the possibility to exit a certain technology without losing data. In other words: If the company that produces a certain software goes bankrupt, will I able to port my data to other software? Or is it now trapped inside the unstable, unsupported software? You should ideally have a positive answer to this question.
The last point on productivity has to do with knowledge acquirement. In the end, what will likely be the best way for us, students and scholars, to learn web development is to skim books and apply and fiddle around with actual code. There is, however, a large amount of consumable media such as videos on YouTube and podcasts, which are also helpful. We would not want to spend our working hours on them, but by subscribing to some handpicked channels, you may find yourself using these to fill the time you use to relax: for example, while exercising or while traveling on public transport. The best one to choose, in my experience, aim for a beginners-to-intermediate level and do not aim to cover the latest news but do long-form discussions of best practices. Even if at first you do not understand what they are saying, there is merit in listening. Over time, you will slowly pick up the vocabulary currently in use by web developers, and you will get a sense of what is currently hot and what is not.
8 Quantitative Analysis of the Collection
In the current method of cataloging, a total of 1528 objects were registered. The vast majority of them, 93%, are books. The rest are mostly journal articles and, notably, a total of twenty-six manuscripts. To display this properly, we would need a pie chart that first divides 93% books to 7% other materials, with the ‘other materials’ clickable to show a new pie chart dividing up these other materials into 70% journal articles, 24% manuscripts, 4% book sections, and 2% other kinds of documents. Obtaining this information was as simple as sorting the items in Zotero according to kind and then selecting all of one kind to see the number of items.
One Excel sheet with one column, indicating the originating city and destination (Vienna), separated by a comma, and 1209 rows.
One Excel sheet with one column, 96 rows, with unique city name, a comma, longitude and latitude in between quotation marks, and separated by a comma.
I copied this column into my code editor, Visual Studio Code, which has an advanced Find and Replace function. With the regular expressions function on, with trial and error, I landed on the following Find expression:
And for the Replace expression I took:
In the last two chapters, we laid the foundations for working with digitized manuscripts, concluding that we can best store text in a plain text file format such as XML or JSON, and store symbols or shapes in a vector image file format such as SVG, and, lastly, to inform and connect all these matters by academic standards such as TEI and IIIF. In this chapter, we learned of the basic skills to practically put all these files together to create a visual appearance—which we can call a website, web app, digital edition, or digital catalog. The two most important lessons to draw from this and the previous chapters are that technology only remains as powerful as the user wielding it and that we do not need to know everything—just those aspects that help us build a solution. This is why it is very important to keep an open and creative mind. With knowledge of the fundamentals of different technologies, it will be relatively easy to understand how a certain problem can be solved and whether you can learn how to implement that solution in a time that is sufficiently short enough. Now that we have acquired a few basic assets for our practical toolbox, let us, in the next chapter, reach for a higher level of technical skill by delving into the programming language Python and its application to codex images.
Hirtle, P.B. “The Impact of Digitization on Special Collections in Libraries.” pp. 42–52 in Libraries & Culture 37, no. 1 (2002), p. 43.
Nichols, S.G., “Materialities of the Manuscript: Codex and Court Culture in Fourteenth-Century Paris,” pp. 26–58 in Digital Philology: A Journal of Medieval Cultures, vol. 4, no. 1 (2015), p. 27; Ornato, E. “La Numérisation Du Patrimoine Livresque Médiéval : Avancée Décisive Ou Miroir Aux Alouettes ?” pp. 85–115 in Kodikologie Und Paläographie Im Digitalen Zeitalter 2, edited by F. Fischer, Chr. Fritze, and G. Vogeler. Norderstedt: BoD, 2010. p. 96; Riedel, D. “How Digitization Has Changed the Cataloging of Islamic Books.” Research Blog Islamic Books, August 14, 2012. Dahlström, M. “Critical Editing and Critical Digitisation.” pp. 79–98 in Text Comparison and Digital Creativity, edited by W. van Peursen, E.D. Thoutenhoofd, and A. van der Weel. Leiden: Brill, 2010, p. 90ff.
Susanne Kurz rightfully made it one of the pillars in her practical introduction to Digital humanities, see Kurz, S. Digital Humanities: Grundlagen Und Technologien Für Die Praxis. Wiesbaden: Springer Vieweg, 2015.
There are alternatives. For example, Tropy is free software specifically developed to take in thousands of photos a researcher takes at an archive and provide the user with a friendly way of making sense of all the photos back at home. A more fully-fledged alternative to Evernote is Onenote.
I know because I had the tv-show The Office on in the background and by the end of my manual data entry I had reached Season 8.
You can see the changes by looking through the commit history of my GitHub repository for this code.
I first considered filtering the generated table, but this caused a number of issues which I do not think are worth going into.
VS Code does this through integration of Emmet.
I used my iPad as a second screen with the app Duet Display, using a Mounty from Ten1Design to fasten my iPad on the side of my MacBook screen.
In early 2016, Academia.edu started charging money to have your new publication seen by others as ‘recommended’, a move so brazen it was mistaken at first for a scam, cf. Ruff, C. “Scholars Criticize Academia.edu Proposal to Charge Authors for Recommendations.” The Chronicle of Higher Education, January 29, 2016. Later that year, a ‘premium’ service was introduced, broadly criticized as predatory, cf. Bond, S. “Dear Scholars, Delete Your Account At Academia.Edu.” Forbes, January 23, 2017.