Operatic Productions in the Netherlands, 1886–1995: from Printed Annals to Searchable Performance Data

This data paper accompanies the database Operatic productions in the Netherlands, an open dataset containing details on over five thousand opera productions in the Netherlands between 1885 and 1995 extracted from the Annalen van de Nederlandse Operagezelschappen (Annals of the Opera Companies in the Netherlands), which appeared in book form in 1996. These data give an extremely rich account of the performance history of operatic works and the personnel involved in their production. Since the original publication lacks a critical introduction, the authors have attempted to reconstruct the origins and systematics of the collection. They also discuss the attributes of the data and the basic data structure in order to give users relevant information to use and restructure the data for their interests. The data structure and metadata classifications are based on an inventory of the classifications used in existing performing arts van Nieuwkerk et al.


Introduction
In 1996 the Dutch Theatre Institute and Dutch National Opera published the Annalen van de Nederlandse Opera-gezelschappen (Annals of the Opera Companies in the Netherlands; from here on: the Annals), a fifteen-pound-heavy book of almost 1,300 pages with details on over five thousand opera productions in the Netherlands (Hulpusch, 1996). Now, more than twenty years later, this ambitious but unwieldy overview of Dutch opera history has finally been digitized and made available as open data as part of the CREATE research program of the University of Amsterdam. In the digitization process, the original indices were entered into a relational database that brings out the full research potential of the data. The database gives an extremely rich account of local operatic culture, the careers of singers, set and costume designers, directors and the local performance history of both famous and lesser-known operatic works.
In this data paper, we will discuss the attributes of the data and, since the original publication is lacking a critical introduction, attempt to reconstruct the collection's origins and systematics. Furthermore, the basic data structure will be introduced in order to give users relevant information to use and restructure the data according to their interests. The data structure and metadata classifications used are based on the MEPAD European inventory of the classifications used in existing performing arts databases across Europe ( Baptist, Van Oort, & Noordegraaf, forthcoming). This should facilitate the connection to other performing arts databases.

81
Operatic Productions in the Netherlands, 1886-1995 research data journal for the humanities and social sciences 5 (2020) 79-90

Lexicons and Datasets
In the scholarly study of opera there is a long tradition of printed reference works that give a selected overview of works, composers, premiere dates and performers (e.g. Loewenberg, 1955;Gatti, 1964;Seeger, 1978;Sadie, 1992). In many ways, these printed volumes are the paper predecessors of digital databases: collections of entries connected to sets of descriptions (metadata). Focusing on canonical works and composers, such lexicons, annals and indices have long formed the backbone of music studies. In recent decades, however, new avenues of enquiry have opened up within musicology. Attention has shifted from the study of musical works as 'texts' to broader questions concerning the function of music in social and cultural life, and scholars have taken up a more bottom-up approach to music history involving socioeconomic aspects of concert life, patterns of taste, and local musical practices (e.g. Pasler, 1993;Johnson, 1995;Weber, 2001Weber, , 2009aWeber, , 2009bHall-Witt, 2007;Bashford, 2008). As a result, issues of performance and reception have assumed a prominent place in the field.
With these expanding interests, the types of sources used are becoming more diverse as well. An inventory by Vincent Baptist (2018) for MEPAD: Mapping European Performing Arts Data shows that all over Europe, relational databases are currently being created to connect traditional indices of works and composers to events, venues, programming practices, journalism and other relevant aspects of musical culture. Examples such as The Concert Programmes Project (Ridgewell, 2010) and Concert Life in Nineteenth-Century London Database (Dix, Cowgill, Bashford, McVeigh, & Ridgewell, 2014) show the enormous advantages of these datasets for research into musical cultures and practices. In the meanwhile, the performance history of various prominent opera houses such as the London Royal Opera House, Paris Opéra Comique, and Prague National Theatre has been made available in digital form. By transforming existing printed datasets into relational databases, their usefulness for future research can be secured.
The database presented here also opens up possibilities for expanding research beyond the realm of individual famous opera houses into opera practices at a regional level. The Annals lists the operatic productions staged by hundreds of major and minor organizations that have been active in the Netherlands in the period 1886-1995. This allows for a big-data approach addressing large-scale processes such as the canonization and dissemination of repertoires.
For each production, the Annals provide detailed information, identifying premiere dates per season as well as its contributing artists, musicians, roles, van Nieuwkerk et al. 82 research data journal for the humanities and social sciences 5 (2020) 79-90 singers, directors, choreographers, set and costume designers, producers, light designers, etc. Furthermore, it contains information about the translation or adaptation of operas and, in some cases, even the languages of individual arias if different from the language of the rest of the performance. The main limitation of the dataset is that it focuses on productions -that is, the version of the opera produced by a certain director and artistic team, rather than individual performances. It only records the date of the season premiere for each production without further performance dates. Therefore it can not be used to collect statistics about the exact number of performances of an opera.

The Annals: Origins and Digitization
The Annals originate in a private card index of opera performances created by theatre history enthusiast Peter Hulpusch. He was encouraged to publish his material by professor of musicology Eduard Reeser and Hans Kerkhoff, founder of the 'Saturday Matinee' at the Amsterdam Concertgebouw. Hulpusch then sought collaboration with the Dutch Theatre Institute (tin) that facilitated the publication. (Hulpusch 1996; for more details on the backgrounds of the original publication, see this article by van Nieuwkerk, 2018a;Beer, 1996;Jansen, 1996). The original printed Annals give a chronological overview of productions per opera company per season, with information about the premiere date, the cast, the artistic team, etc. In the original indices the information is grouped into six main registers: titles, composers, librettists, companies, artistic and/or managing directors and names. For each opera production, the work's original language as well as the language of the staging is indexed ( fig. 1).
The indices were particularly helpful in the post-correction of the digitization process. After converting the printed text into 'machine-encoded' text with the optical character recognition (ocr) tool Tesseract 4.0, we used the indices as a dictionary to clean up the ocr. By comparing the different indices with the ocr scanned texts, errors could be recognized and corrected automatically or manually, depending on the scale of occurrence.

Data Structure
-Operatic productions in the Netherlands deposited at dans -doi: https://www.doi.org/10.17026/dans-zcy-g3pt -Temporal coverage: 1885-1995 83 Operatic Productions in the Netherlands, 1886-1995 research data journal for the humanities and social sciences 5 (2020) [79][80][81][82][83][84][85][86][87][88][89][90] The data in the relational database is structured following the general categorization of the book, with the indices preserved as the main tables in the database. This produced two main layers: 1) works, connected to composers and librettists; and 2) productions, connected to a premiere date and all its contributors. It is important to note that, while the Annals only list productions, every production accounts for a set of distinct events (individual performances of that production). In order to make the dataset compatible with analogous event-based performance datasets found in the MEPAD inventory, such as the Royal Opera House Performance Database (The Royal Opera House, n.d.) and AusStage (The AusStage Consortium, n.d.), a third layer denoting 3) performance events was added. In the current dataset, every production id is connected to a single event only, the premiere, but the extra layer allows users to add the dates and details of subsequent performances by creating new events and connecting them to the same production.
The final structure of the data is represented in figure 2: to each level of the performance (work, production and event) people (contributors) and organizations are connected using crosstabs. Works are connected to composers and librettists, productions to a production company and the artistic team involved in the production (directors, translators, set and costume designers, light designers, choreographers, etc.) and events to a venue (with a location) and the performing cast (singers, actors and musicians).
As far as possible, the many annotations used in the book have been transformed from string text into structured tables with cities, production companies, persons roles or organization roles, etc. to allow for systematic inquiry. For all of these tables, we have provided a column with an English translation of the essential information to make it accessible to an international audience. Some data, however, particularly the indications of the singers' roles (which may include translated names such as 'graaf Almaviva' or unnamed characters such as 'a notary' or 'an officer'), is provided only in the Dutch spelling as given in the Annals, since we lacked the capacity to harmonize and translate all the 47,273 role names.

Coverage
The database currently includes information on 5,057 productions of 785 operas involving 623 organisations and 9,112 people (artists, composers, managers, etc.). In spite of the original ambition to include "all the productions, companies, composers, librettists, singers, conductors and directors […] of a century-long life of opera" (Hulpusch, 1996, p. 6), we cannot be certain about the coverage of the collection. The authors did not define the scope of their ambitious project, nor did they specify the archival origins of the information used. In order to identify possible gaps and limitations to the data and to reconstruct the collection principles of its collectors, we have matched the data to analogous datasets. The Production Database of the former Dutch Theatre Institute (tin) offers an interesting point of reference for analysing the scope of the Opera Annals (Theatercollectie, UvA, n.d.). Although both the Opera Annals and the Production Database were created under the auspices of the tin, the two databases do not seem to have had a shared origin. After the digitization of the Annals was finished, the tin has begun to harmonize their Production Database with the current opera dataset.
This Production Database has obvious similarities with the Annals in terms of structure and shows a significant overlap with it. However, it includes activities of all performing arts and is therefore much larger in scope. When we filter out the non-operatic productions from the Production Database, the extensiveness of the Annals becomes apparent. Figure 3 shows the number of events in the Production Database labelled as 'music theatre' , 'opera' or 'operetta' , compared to the number of events in the Annals. The differences in the amount of data collected per season are striking. The Annals collection is especially rich for the period 1886-1924. Between 50 and 100 productions per season are indexed in de Annals in this period. Given the fact that these dates are only the first performances (premieres) of every production, the total number of opera performances can be estimated at at least a factor ten more (see this article by van Nieuwkerk, 2018a). This means that according to the Annals there would have been a yearly average of between 500 and 1,000 operatic performances in the Netherlands in this period. Figure 3 shows that, beginning in 1984, 'opera' events in the tin collection outnumber the Annals. It is worth noting here that, originally, Peter Hulpusch had wanted to cover only a century of opera, starting with establishment of the van Nieuwkerk et al. 86 research data journal for the humanities and social sciences 5 (2020) 79-90 first Dutch-language opera company in 1886 and ending with the opening of Het Muziektheater in 1986. Since the 1980s, there has been a proliferation of small-scale opera companies and the editors of the Annals do not seem to have included all of these. Small, hybrid companies such as the Needcompagny, Taller Amsterdam and Theaterwerkplaats InDependence do not appear in the Annals.
We also compared the data of the Annals to samples of analogous datasets that are more specialised than the Production Database, covering the repertoire of individual venues or companies. We matched the data to a sample of the repertoire of the Amsterdam City Theatre in the years 1886-1890 from act: Amsterdam City Theatre Database (Blom, n.d.), the repertoire of the Dutch Opera Foundation indexed in Coleman (1986) and the Dutch Opera Archive, a website with productions of the National Opera, Nederlandse Reisopera, Opera Zuid and their predecessors (Lever, n.d.). This process did not reveal any structural gaps, only some individual instances of productions that appear to have been overlooked by the collectors. Those instances have been added to the dataset (see this article by van Nieuwkerk, 2018a for an overview of these instances).

Defining Opera
Any data collection can only be complete within the limits set to it by its collectors. When it comes to opera, setting these limits is no easy matter. Attempts Figure 3 The number of events in the Annals, compared to the events in the Production Database with the general tag 'muziektheater' (music theatre) and the secondary tags 'opera' and 'operette' (operetta) from http://theatercollectie.uva.nl/search/advanced, retrieved january 23, 2019.

87
Operatic Productions in the Netherlands, 1886-1995 research data journal for the humanities and social sciences 5 (2020) 79-90 at defining the term 'opera' invariably lead into a terminological minefield in which, as Tim Carter (2014) has observed, one is forced to choose between narrow definitions that exclude too many and broad ones that exclude too few. Many examples of "a drama in which the actors sing some or all of their parts" (Brown et al., 2001) -such as 'musical' and 'operetta' pieces -are likely to provoke a heated discussion among operatic gatekeepers (Abbate & Parker, 2012). This might explain why the criteria for the inclusion of operas in annals and lexicons are often vague and subjective. Alfred Loewenberg, for example, writes in his introduction to his well-known Annals of Opera (1955, p. 4) that "those works have been selected that have obtained success or attracted attention outside their countries of origin. […] [T]he number of entries could easily have doubled; but the book had to be kept within reasonable limits." The Annals, which are explicitly titled Annals of the Opera Companies in the Netherlands, avoided this choice of works by turning to a selection of companies. The cross-referencing with external databases confirms that the limits set by the collectors of the Annals were in the first place institutional and geographical: the collection includes productions of opera companies performing within the Dutch geographical border. We found, therefore, that only productions by institutions identified as actual opera companies have been included: operatic performances by companies principally devoted to other genres such as spoken theatre, operetta, puppet theatre, or children's theatre were not.
On the work level the term opera is generally used in a broad sense, covering works with either sung recitatives or spoken dialogue. Operettas, too, are included if performed by opera companies, as were other forms of music theatre with strong connections to the western operatic tradition, such as Ligeti's Aventures et nouvelles aventures, Weill's Dreigroschenoper and Stravinsky's Histoire du soldat. Other border cases such as puppet opera, opera-ballet, radio opera, rock opera and traditional Chinese opera, on the other hand, were not. Also excluded are plays with incidental music, such as Beethoven's Egmont, Grieg's Peer Gynt and plays advertised in the Amsterdam City Theatre as 'klucht met zang' (comedy, or farce, with song), as well as oratorios, cantatas and concert performances of opera in concert halls (see this article by van Nieuwkerk, 2018a).

Research Potential
The country that, historically, has no strong reputation for being receptive to opera and has therefore attracted little scholarly interest. We can now begin to chart the ebb and flow in the popularity of particular works, composers, subjects, styles and traditions on various Dutch stages and involve them in central questions of quantitative repertoire research concerning the formation of operatic canons and the tensions between 'classical' and contemporary repertoire that have been extensively studied for concert music, but have only more recently become a topic in opera research (Weber, 1992(Weber, , 2003(Weber, , 2009a(Weber, and 2009bHall-Witt, 2007;Weber & Newark, forthcoming). Crucially, the database has been designed to facilitate comparison and integration with other datasets, which we hope will eventually allow for systematic research in a variety of contexts and on a pan-European or worldwide scale. It allows for comparative studies of operatic cultures, studies into the diffusion and circulation of operas, and -given the detailed information on personnel supplied in the Annals -studies into the professional networks and cultural industry that enabled operatic performances.
The fact that the dataset includes information about performance language and alternative titles also makes it a helpful source for the study of translation and adaptation in musical practice; and the occasional notes about the language in which individual singers performed their parts offer a glimpse into the specific national traditions performers were trained in and the practice of so-called suitcase arias (Poriss, 2009).
Finally, opera, as the collaborative art form par excellence, offers various opportunities for interdisciplinary research. As an example, one of us has conducted a pilot study relating the data from the Annals to the programming of operatic excerpts on Amsterdam's concert stages, Felix Meritis and the Concertgebouw (van Nieuwkerk, 2018b). The possibilities for exploring the interrelations with data from other cultural disciplines such as spoken theatre, literature or film are manifold.
We hope, therefore, that the Operatic productions in the Netherlands database will prove to be a welcome contribution to the study of this art form that cuts across both artistic domains and national borders.