The Homeric Dependency Lexicon What it is and how to use it

This paper presents the Homeric Dependency Lexicon (HoDeL), a new resource with a user-friendly interface facilitating the study of Homeric verbs and dependents. HoDeL was induced from the analytical layer of AGDT 2.0, extracting all dependents tagged as SBJ, OBJ, PNOM, and OCOMP with a set of SQL queries. The paper illustrates HoDeL functionalities and shows how they can be employed by researchers to answer specific research questions about the Homeric language. Introducing the uses of HoDeL offers the opportunity to reexamine some crucial, though frequently underestimated, methodological challenges concerning annotated corpora and resources derived from them that relate to the linguistic theories underlying annotations and error propa-gation. It is argued that the careful documentation of how linguistic resources were created, what data they contain, and how they can be queried through their dedicated interfaces is essential to lay the groundwork for users’ investigations.


Introduction
I present here HoDeL (The Homeric Dependency Lexicon), a new linguistic resource facilitating the investigation into Homeric verbs and their dependents. Introducing the uses of HoDeL also offers an opportunity to reexamine some crucial, though frequently underestimated, methodological challenges that arise when one uses electronic linguistic resources to study ancient and modern languages.
Thus, my aim here is two-fold: (a) to fully document the construction of HoDeL and explain its basic functionalities with concrete examples; (b) to thoroughly account for the ways in which methodological challenges raised by electronic linguistic resources may affect HoDeL users. On the one hand, a detailed explanation of the functionalities of these resources is crucial for users: as Anthony (2013: 142) has emphasized, data may yield different results owing both to the observers' subjectivity and also to the idiosyncrasies of the tool or interface through which data is observed. On the other hand, it is important to acknowledge that (i) annotated corpora and linguistic resources induced from them inevitably do contain errors and that (ii) annotations based on particular underlying linguistic theories should be followed by careful notes alerting users to the theoretical foundations that underlie the annotations. This is critical, as linguistic resources induced from annotated corpora distance the researcher from the researched data on at least at three different levels: (a) the interface level, (b) the data-extraction level, and (c) the annotation level.
I structure my presentation as follows. Section 2 contains the background framing HoDeL: it discusses the notions of corpus, treebanks, and valency lexica, and reviews the linguistic resources available for the study of Ancient Greek (henceforth, AG). Section 3 presents HoDeL: after providing details on the architecture of dependency treebanks, it accounts for HoDeL's construction and describes its basic functionalities. Section 4 illustrates the types of theoretical and practical issues that HoDeL users may encounter with specific exemplars and it provides some suggestions for how the lexicon can be used productively. Section 5 contains concluding remarks.

Background: linguistic resources and valency lexica
Ancient languages such as AG enjoy a long tradition of literary, philological, and linguistic research, based on the manual collection of data from written texts, in many cases preserved up to the present by accidents of history (Joseph & Janda 2003: 15-19). Because this data derives from historical records of ancient languages, they are known as Korpussprachen 'corpus languages' (see, among others, Mayrhofer 1980;Untermann 1983;Haug 2015: 187;Eckhoff et al. 2018b: 300); in this context, the term corpus is broadly understood as "a body of naturally occurring language" (McEnery et al. 2006; for a narrower definition of corpus, see, e.g., Sinclair 2005). In such a long tradition of corpus studies, the digital turning point premiered with Father Busa's Index Thomisticus, a pioneering electronic collection of all words contained in Thomas Aquinas' opera omnia (Busa 1980;Nyhan & Passarotti 2019). From the second half of the 1960s, Father Busa undertook the task of systematically collecting Thomas Aquinas' words, initially on punched cards and later on magnetic tapes. The printed version of the Index appeared in 1980 and consisted of 56 volumes, and a CD-ROM version was released later in 1989. Today, a morphosyntactically annotated version of Thomas Aquinas' corpus is maintained at Università Cattolica del Sacro Cuore of Milan and is being integrated with other linguistic resources for Latin within the LiLa Project. Since Father Busa's time, the efforts to digitize various linguistic and non-linguistic materials have increased, and they contribute to expanding the field of what has come to be known as Digital Humanities (see Schreibmann et al. 2004, among many others). Large-sized electronic corpora of AG texts are now available online. The list of such corpora is long and includes, among others, AGDT 2.0 (Ancient Greek Dependency Treebank), The Chicago Homer (a database for Early Greek epic), DĀMOS (a database of Mycenaean; Aurora 2015), DFHG (Digital Fragmenta Historicorum Graecorum), The Diorisis Ancient Greek Corpus (Vatri & McGillivray 2018), EAGLE (Electronic Archive of Greek and Latin Epigraphy), the Perseus Digital Library (Bamman & Crane 2008) Text-und Sprachmaterialien), and TLG (Thesaurus Linguae Graecae, fully available online for subscribers only).
In some of these corpora, the digitized versions of AG texts are also enriched with mark-up and/or annotation. The term "mark-up" pertains to the textual document as a whole and can be stylistic, philological and/or archaeological in nature; for example, dates and literary genres are annotated in The Diorisis Ancient Greek Corpus. Information concerning the place of discovery, editions, textual issues, and hand attribution is associated with the digitized version of the Mycenean tablets contained in DĀMOS (on the necessity of including stylistic, philological and archeological information in annotated corpora of ancient languages, see Eckhoff et al. 2018b: 302).
For the aims of this paper, the most relevant type of metadata is the multilayered annotation that provides linguistic information on words and/or text chunks at different linguistic levels: (a) morphological annotation, including POS-tagging, lemmatization, and inflectional morphology (see e.g. The Chicago Homer; The Diorisis Ancient Greek Corpus, and to some extent, derivational morphology also is annotated in the corpora of the PROIEL family); (b) syntactic annotation, according to which syntactically parsed sentences are represented and stored as syntactic trees in "treebanks" (see e.g. AGDT 2.0; PROIEL); (c) semantic and pragmatic annotation, i.e. metadata concerning semantic roles, other semantic information (e.g. animacy of event participants, semantic class of verbs), and information structure (cf. AGDT 2.0, documented in Celano & Crane 2015; PROIEL); and (d) genre-specific annotation (e.g. The Chicago Homer, in which formulas of the Early Greek epic texts are annotated).2 Corpora that contain multi-layered linguistic annotation can be exploited to create other treebank-based linguistic resources, such as valency lexica. The term "valency" was first borrowed from chemistry into linguistics by Lucien Tesnière (see Ágel & Fischer 2015 for precursors), within the framework of Dependency Grammar (Tesnière 1959). Here, valency refers to the extent to which verbs combine with and determine the form of a fixed and predictable number of participants. These participants are called actants by Tesnière. In Tesnière (1959), actants are contrasted with circumstants, free modifiers that also depend on verbs but do so instead by virtue of their modifying potential. The English terms complement and adjunct are meant to replicate Tesnière's actantcircumstant distinction (Matthews 1981). Frequently, actants/complements are also called arguments, especially in the US linguistic tradition of the last quarter of the 20th century. The latter term, originally borrowed from logic (Frege 1891), holds an ambivalent status between semantics and syntax: it can either indicate all inherent roles that occupy a place in the semantic relationality of a concept or be used to name constituents headed by other verbal constituents. This interaction-and, often, overlap-between semantic and syntactic valency is fundamental to grasping the theory of valency underlying the data of HoDeL (on which, see Section 4).
Valency theories occupy a core position within the study of human language, as human beings perceive events as consisting of entities, chiefly expressed by nouns, and relationships between entities, chiefly expressed by predicates. Thus, to speak about events, human beings create predicate-argument structures (e.g., Ágel & Fischer 2015). Accordingly, since Tesnière, several linguistic theories have put valency and argument structure at the very core of their Journal of Greek Linguistics 21 (2021) 263-297 research agenda but have disagreed about the very nature of these phenomena: valency has been variably regarded as a concept fundamentally syntactic, semantic, or both. However, until the relatively recent development of Construction Grammar (see Fried & Boas 2005 for an overview), there has been at least one point of substantial agreement: argument structure is a property of verbs, and as such, its information can be stored in (verbal) lexica.
Given its importance for linguistic expression, it is unsurprising that valency-related information is contained in traditional dictionaries of AG. For example, the Liddell-Scott-Jones dictionary (LSJ), under the entry manthánō 'learn' , provides the different meanings of this verb, along with the form and semantic features of the participants associated with these meanings (i.e., man-thánō+acc 'learn something' , manthánō+dat 'understand somebody' , man-thánō+inf 'learn something' , manthánō+apó/ek/pros/pará+gen 'learn from something/ somebody'). Valency-dedicated lexicography began with the pioneering work of Helbig & Schenkel's, (1991[1969) Wörterbuch zur Valenz und Distribution deutscher Verben (on German, see also Schumacher et al. 2004; on English, see Herbst et al. 2004; for a contrastive valency dictionary of Dutch-French-English, see Colleman et al. 2004). Non-automatic valency lexica are also available for ancient Indo-European languages; for instance, Happ (1976) contains a non-exhaustive list of Latin verbs and their valency patterns as evidenced in a corpus of 800 sentences sampled from Cicero's Orationes. In her dissertation, Frigione (2015) collected and published the materials for a verbal lexicon of Old Church Slavic.
Manually collected and thus intuition-based valency lexica are necessarily partial in coverage, not containing comprehensive frequency information on the collected valency patterns, and are extremely time-consuming to build. For these reasons, scholars have increasingly exploited existing annotated corpora to automatically extract verbal valency patterns and their frequencies from texts. In principle, automatic valency lexica are able to capture not only verbal valency as a potential of a verbal lexeme, but also valency as the textualsituational realization of such potential. The first attempts in this direction were methodologically hybrid: for example, PropBank (Kingsbury & Palmer 2002), FrameNet (Ruppenhofer et al. 2010), and PDT-Vallex (Hajič et al. 2003) were initially built with an intuition-based method and later refined with corpus-driven data.
Primarily automatic approaches to valency lexica were first carried out for modern languages (e.g., VALEX, Korhonen et al. 2006 on English;LexShem, Messiant et al. 2008 on French). However, automatically derived valency lexica exist nowadays for ancient Indo-European languages as well. For example, Bamman & Crane (2008) Passarotti (2009) andMcGillivray (2013: 31-60) to induce a valency lexicon from AGDT 1.0 (to my knowledge, these lexica have never been made available online). Finally, a semantic valency lexicon of Latin, Latin Vallex, was derived from a semantically-annotated subset of LTD and IT-TB (Passarotti et al. 2016). Currently, the PROIEL-style treebanks, which can be consulted through the Syntacticus interface, also come with generated dictionaries that include a comprehensive list of valency frames extracted from treebanks.3 In the next section, I present HoDeL-a lexicon of Homeric Greek verbs that has also been built upon the IT-VaLex model-and some of its functionalities.

Presenting HoDeL
In this section, I first document how HoDeL was created and then illustrate its basic usage by showing a number of simple incorporated filters for queries with HoDeL's online interface.4

3.1
The building of HoDeL HoDeL is closely connected with the Homeric texts treebanked at AGDT 2.0. The structure of AGDT 2.0 is modelled on that of the Prague Dependency Treebank of Czech, the groundbreaking project in the field of Dependency Grammar (PDT 3.0; Hajič et al. 1999).
Like other dependency treebanks, PDT 3.0 and AGDT 2.0 are predicatecentred, contain the same number of nodes as the number of words and allow for non-binary-branching trees (on the differences underlying constituent vs. dependency treebanks and among different annotation styles of dependency treebanks, see the summary in Biagetti 2018: 9-37). PDT 3.0, and in principle AGDT 2.0, are multi-layered dependency treebanks: metadata is structured and stored in separated but interlinked morphological, analytical, and tectogrammatical layers. The analytical (i.e., syntactic) layer contains dependency syntactic trees and works as a basis for the tectogrammatical (i.e., semantic) layer, which stores complex semantic role-labelling, information structure, and anaphora/ellipsis resolution, annotated in the framework of the Praguian linguistic tradition of the Functional Generative Description (Sgall et al. 1986). At its first release, the Ancient Greek and Latin Dependency Treebank was the first treebank for AG and Latin (Bamman & Crane 2011 (Celano 2019: 283-284, 288). Thus, no tectogrammatical annotation is available for the Homeric poems.
HoDeL has been automatically induced from the analytical layer of the Iliad and Odyssey treebanks at AGDT 2.0. The guidelines of the analytical layer of AGDT rely on those of PDT with some addenda aimed to increase descriptiveness and precision by incorporating Smyth's Greek Grammar for Colleges (Smyth 1920;Celano 2019: 285-286). Figure 1 shows the dependency tree of Iliad 1.1-7, reported in (1) Achilles.nom 'The wrath sing, goddess, of Peleus' son, Achilles, that destructive wrath which brought countless woes upon the Achaeans, and sent forth to Hades many valiant souls of heroes, and made them themselves spoil for dogs and every bird; thus, the plan of Zeus came to fulfillment, from the time when first they parted in strife Atreus' son, king of men, and brilliant Achilles.' (Il. 1.1-7) From this layer, HoDeL incorporates queries extracting all verbal forms and their dependents labeled as: -SBJ (Subject), as in hḕ murí' Akhaioîs álge' éthēke '… which brought countless woes upon the Achaeans' from (1); -OCOMP (Object Complement), as in autoùs dè helṓria teûkhe kúnessin oiōnoîsí te pâsi 'and made them themselves spoil for dogs and every bird' from (1); -PNOM (Predicate Nominal), as in hòs nûn pollòn áristos Akhaiôn eúkhetai eînai '… who now claims to be far (the best) of the Achaeans' (Il. 1.91); Journal of Greek Linguistics 21 (2021) 263-297 -OBJ (Object), as in mênin áeide theà '(The) wrath sing, goddess' , and in hḕ murí' Akhaioîs álge' éthēke 'which brought countless woes upon the Achaeans' from (1). The OBJ label comprises all verbal arguments except SBJ and arguments labeled as OCOMP and PNOM and hence includes accusative, dative, genitive nouns or pronouns, prepositional phrases, infinitive verbs, accusative+infinitive constructions, and other types of subordinate clauses that can function as verbal objects (for details, see Celano 2019: 286-287 and https://github.com/ PerseusDL/treebank_data/blob/master/AGDT2/guidelines/Greek_guidelines .md#obj). All these dependents may either be direct child nodes of a verbal form or be attached to the verbal head via one of the bridge nodes, specifically, AUXP (preposition), AUXC (conjunction), COORD (coordinator, including coordinative conjunctions and particles), APOS (apposing elements, such as commas). For example, the phrase Atreḯdēs te ánax andrôn kaì dîos Akhilleús 'Atreus' son, king of men, and brilliant Achilles' in Figure 1 is tagged with the bridge COORD (kaì) and contains two subjects, coordinated and hence tagged as SBJ_CO (Atreḯdēs and Akhilleús).6 All extracted dependents are considered part of verbal valency according to the guidelines of AGDT 2.0.
In contrast, we did not extract dependents that are tagged as ADV (adverbials, which provide the event with background information), ATR (NP modifiers) and ATV/ATVV (non-governed complements, i.e., predicative noun phrases / adjectives which may morphologically agree with their head noun, but qualify the whole event denoted by the verb) that the AGDT 2.0 guidelines do not consider belonging to the verbal valency.
Argument dependents have been extracted using a series of SQL queries and then recorded in a spreadsheet, from which a relational database has been built. The relational database in turn interacts with the user interface. The original query algorithm and its implementation were conceived to build IT-VaLex (McGillivray & Passarotti 2009). To induce HoDeL, the queries have been adapted to the AGDT 2.0 tagset. An earlier version of HoDeL, released in 2016 (Zanchi et al. 2018), was based on a previous version of the treebank (AGDT 1.0) and lacked transliteration and English translation.
Transliteration of the Greek script (conforming to the most common current academic standards) and links to English translation have been added to the current version. In order to type Greek characters, HoDeL users should employ Beta Code, as in the Perseus Project and in TLG. The correspondences between zanchi Journal of Greek Linguistics 21 (2021) 263-297 table 1 Greek characters-Beta code correspondences Greek fonts and Beta Code are reported in Table 1. The least intuitive Greek-Beta Code correspondences are highlighted in grey.
Example (2) shows how the first line of the Odyssey looks in Greek characters (in 2a), Beta-Code (in 2b), and in transliteration (in 2c).

HoDeL basic incorporated queries and filters
The HoDeL home page shows a list of Homeric verbs ordered alphabetically. After each lemma, its frequency is provided (Figure 2). Note that users can choose to visualize either the Greek script or its transliteration by flagging 'greek' or 'trans' in the 'Display' box at the top of HoDeL homepage.
By default, HoDeL gives frequency information concerning verbal lemmas and their dependents tagged as SBJ, OBJ, PNOM, and OCOMP, and specifically: -2,482 = type frequency of verbal heads -40,693 = token frequency of verbal heads -4,219 = type frequency of dependent lemmas -49,137 = token frequency of dependent lemmas. When users add filters to their queries, HoDeL always provides these and other frequency counts.
Users can also visualize all lemmas that depend on Homeric verbal heads, as shown in Figure 3. A number of verbs appear in this list: these verbs function as main verbs in dependent SBJ or OBJ clauses.
Both lists in Figures 2 and 3 contain clickable lemmas. For example, by clicking on a verbal lemma in the list in Figure 2, e.g., akoúō 'hear' , users obtain (i) all its forms in the Homeric poems, (ii) the ordered contexts of these occurrences (automatically chunked by an algorithm that exploits punctuation marks), and (iii) syntactic subtrees representing the queried verb and its dependents tagged as SBJ, OBJ, PNOM, and OCOMP ( Figure 4). The lexicon also allows directly typing in verbal and dependent lemmas. By clicking on the grey box, 'Query' , a window opens in which the requested lemma can be typed using Beta Code ( Figure 5; cf. Table 1). In Figure 5, I typed the verb akoúō a)kou/w in the 'Verbal Head Lemma' box. Its relative subtrees can be seen by clicking 'Submit' , the button that launches queries.
arguments in principle could refer both to impersonal verbs and to intransitive verbs with null subjects. For the present, we have decided not to integrate empty nodes for null subjects: the absence of empty nodes is a structural feature of the analytical layer of dependency treebanks modelled on PDT. Moreover, AG also allows for null referential objects (Luraghi 2003;Haug 2012). Thus, if one decides to integrate null subjects by manually disambiguating impersonal verbs, they must also integrate null objects for the sake of consistency. Null referential objects are, however, far more difficult to detect than null subjects (Section 4). Without integrating null objects, an automatically induced valency frame with a single argument could in principle refer to intransitive usages of transitive verbs or to transitive verbs taking null referential objects.
Journal of Greek Linguistics 21 (2021)  In the output passages and subtrees (see Figure 4 and Figure 6), the verbal forms are circled in orange, whereas the dependents are highlighted in blue. By pointing at a word in the output contexts, users obtain morphological annotation as stored in the morphological layer of AGDT 2.0 ( Figure 6). For example, Figure 6 shows that the form euxaménou is the genitive masculine singular of the aorist middle participle of the verb eúkhomai 'pray' . Furthermore, if users click on the blue folder after the Greek text, HoDeL provides the corresponding English translations. The latter have been automatically aligned with the Greek text using an algorithm that exploits punctuation marks and text chunks contained in the texts provided at the Perseus Digital Library. The automatic alignment has been manually checked and, when necessary, modified according to the translation available at The Chicago Homer (Figure 6).8 The purple box 'Args Number' shows (i) frequency information concerning the number of arguments taken by verbal heads, and (ii) frequency information concerning the syntactic relations (SBJ, OBJ, PNOM, OCOMP) of arguments taken by verbal heads. For example, as shown in Figure 7, akoúō 'hear' can take from zero (37 occurrences) to three arguments (2 occurrences); when this verb is the head of two dependents, the latter can have different syntactic functions, called 'Subcat.(egories)' in the resource. By flagging one of these categories ('No. Args') and subcategories (i.e., argument number and functions), users obtain filtered passages and subtrees. The categories and subcategories suggested by the system are not pre-established but rather corpus-induced for each selected verb.
The incorporated filter 'Args Order' allows users to investigate constituent order in Homeric Greek. As shown in Figure 8, at a lower level of granularity, attested relative orders of verbs (akoúō 'hear' in this case) and OBJ depen-Journal of Greek Linguistics 21 (2021) 263-297 figure 6 Visualizing morphological metadata and English translations dents, together with their frequencies (attested verb-OBJ orders are labelled as 'Cat.(egories)' in the interface), are provided for users. At a higher level of granularity, for each attested verb-OBJ order, the relative positioning of other argument dependents, such as SBJs, can be taken from the lexicon (in this case, attested orders are labelled as 'Subcat.(egories)'). As seen in the 'Args Number' example, these orders are given by the system based on patterns attested in the corpus. Both categories and subcategories of orders can be flagged to obtain filtered contexts and subtrees.
In addition, users can search the Homeric verbs by argument relation and case/mood using the blue box 'Arguments' . In Figure 9, the attested functions and forms of arguments taken by the verb akoúō 'hear' are shown, and each attested 'Cat. inflected in a specific morphological voice. Users can further search for specific argument lemmas and filter their outputs based on the syntactic and morphological features of these lemmas, including relation, case/mood, preposition, conjunction, and position with respect to the verb.
Finally, HoDeL allows users to combine all these parameters to search for more than one argument at a time by means of the 'Add another argument' option. Multiple arguments can be either filtered per sequence or per cardinality by flagging the corresponding box. HoDeL: issues and perspectives In this section, I discuss a number of theoretical and practical issues of HoDeL, occasionally framing them within a more general discussion about the pros and cons of annotated corpora. Then, I detail how HoDeL functionalities can be used to answer specific research questions.

4.1
HoDeL: theoretical issues As illustrated in Section 3.1, HoDeL has been automatically induced from a dependency treebank. Yet, HoDeL (and similar valency lexica, such as those described in McGillivray & Vatri 2015) is not corpus-driven nor theory-free. On one hand, the series of SQL queries underlying HoDeL has the advantage of making possible the automatic virtual retrieval of all instantiations of all valency frames of all Homeric verbs. On the other hand, queries based on annotation schemes, including those of HoDeL, are limited by only being capable of retrieving whatever has been annotated according to such annotation schemes (cf., e.g., Tognini-Bonelli & Sinclair 2006: 214). Annotation schemes are never theory-free: they necessarily rely on a source theory, as annotating zanchi Journal of Greek Linguistics 21 (2021) 263-297 figure 9 Filtering arguments by relation and morphological information means "adding interpretative information into a collection of texts" (Hovy & Lavid 2010: 1). Therefore, issues may arise when (a) annotation guidelines are either too fine-grained or too coarse-grained in applying and exemplifying the source theory; or (b) there is a mismatch between the theory of annotators and that of linguists using the annotated corpus for theoretical research and/or applied purposes (McGillivray & Vatri 2015: 103).
HoDeL is no exception in this respect. In principle, annotation schemes should be designed according to general linguistic theories that meet general consensus within the linguistic community and can be easily ported to several different languages (Haug 2015). However, the theory of valency underlying AGDT 2.0-and PDT before it-is the theory of valency of the Functional Generative Description (cf., e.g., Panevová 1994), which is linked to the Praguian linguistic tradition and Czech strategies of argument encoding.
The ticipants (e.g., actor, patient, addressee, origin) are verb-specific and thus can occur only once per verb, whereas free modifications (e.g., time, place, goal, instrument, etc.) are not verb-specific and can be added freely to the sentence. The distinction between inner participants and free modifications seems to be semantically based, while that between obligatory and optional complementations is grounded in the syntactic notion of obligatoriness. These two classifications do not overlap: there can exist optional inner participants, such as origin with motion verbs, and obligatory free modifications, such as instrument with verbs of cutting. In PDT, this valency theory is not accounted for at the analytical layer, but at the tectogrammatical one, which also contains anaphora and ellipsis resolution (Section 3.1). Overall, then, the theory of valency of the Functional Generative Description seems to be based on semantic criteria rather than on syntactic ones. The only notion that seems to distinguish syntactic arguments (i.e., OBJ) from adjuncts (i.e., ADV), which is mentioned in the PDT guidelines, is obligatoriness.
As shown in Section 3, there is no tectogrammatical layer available for the Homeric poems treebanked at AGDT 2.0. Naturally, it follows that the valency theory is discussed in the guidelines of the analytical layer of AGDT. Those guidelines, though, explicitly state that AGDT 2.0 inherits its valency theory zanchi Journal of Greek Linguistics 21 (2021) 263-297 from PDT, but the critical notion of obligatoriness is not dealt with at a sufficient level of granularity. Moreover, the underlying treebank architecture, the lack of the tectogrammatical layer of AGDT 2.0, and the semantic nature of the theory of valency of PDT produce a number of mismatches in the argument vs. adjunct annotation with respect to the most widely accepted theories of syntactic valency.
For HoDeL users, this means keeping in mind that the valency theory of the Functional Generative Description can result in a number of unexpected consequences, the most relevant of which are discussed here. To begin with, agents of passive verbs, such as ek Diós 'by Zeus' in (3), are annotated as OBJ, as part of the verbal valency: (3) ēdè and phílēthen love.aor.3pl.pass ek out_of Diós Z.gen '… and they were loved by Zeus.  This is because, at a semantic level, agent participants are inner participants of transitive verbs. However, passive voice is usually acknowledged to be a syntactic valency decreasing strategy, which removes agents from argument structure and makes them optional (e.g., Siewierska 2005).
Beneficiary and instrumental datives are inconsistently tagged at times as OBJ, i.e., as syntactic arguments (4), and at times as ADV, i.e., as adjuncts (5)   In (4) and (5), the same dative plural, ophthalmoîsi 'with (my) eyes' , is inconsistently tagged as to its dependence on forms of the same verb, eîdon, the aorist suppletive form of horáō 'see' . Again, this inconsistency results from the fact that 'eyes' are optional inner participants in the event of seeing: from a semantic standpoint, 'eyes' are an argument dependent, whereas from a syntactic point of view, they are not. This mismatch generates errors in the annotation.9 Journal of Greek Linguistics 21 (2021) 263-297 Other instructive examples of inconsistent tagging are genitive dependents taken by akoúō 'hear': these genitive participants can either play the role of stimuli and function as syntactic arguments or play the role of sources of information and function as adjuncts (on the argument structure of akoúō 'hear' , see Luraghi 2020: 85, 127-135). Irrespective of the actual syntactic status of such genitives, they are all tagged as OBJ in AGDT 2.0, as shown in (6) and (7). Thus, in this case, the annotation is consistent but does not account for a fundamental difference in the valency frame of this verb.  (7), the genitive plays the role of source of information and thus is syntactically an adjunct, whereas the stimulus role is played by a null referential object (cf. Luraghi 2003;Haug 2012 and the discussion below). Once more, this mismatch is due to the fact that the source of information is an optional inner participant in the event of hearing/coming to know. Example (7) directs our focus to another feature of AGDT 2.0 that also raises issues for automatic argument structure detection. Specifically, the analytical layer of AGDT 2.0 does not comprise empty nodes for null arguments (Section 3.1): in the dependency treebanks modelled on PDT, null arguments are integrated at the tectogrammatical layer. However, AG is a pro-drop language, tation and does not reflect theoretical issues in the annotation scheme. However, by checking the annotation of all the Homeric passages in which a form of eîdon and a prepositionless dative plural of ophthalmós co-occur, one notices that this dative is annotated as OBJ in 4 out of 26 passages. Moreover, the status of the 'eyes' participant in events of seeing must be particularly problematic, given that even en ophthalmoîsin 'with her eyes' is annotated as OBJ in Od. 8.459. zanchi Journal of Greek Linguistics 21 (2021) 263-297 which means that by default it omits topical subjects, which are indexed on verbs through personal endings (cf. also fn. 9). Moreover, AG, as well as other ancient Indo-European languages, preferably or obligatorily selects null referential objects in certain syntactic and pragmatic contexts, including conjunct participles, coordinated verbs and clauses, and yes/no questions (Luraghi 2003;Haug 2012;Keydana & Luraghi 2012;Sausa & Zanchi 2015). Both null subjects and null referential objects appear frequently in the Homeric poems and, crucially, fill slots of verbal valency. The fact that they are not included in the syntactic trees of the analytical layer of AGDT 2.0 results in an incomplete account of the valency of a number of verbs.10 Integrating these types of arguments in HoDeL would be problematic with respect to the structural consistency of AGDT 2.0: the architecture of AGDT 2.0 is modelled on PDT, and thus, in principle, the analytical layer of AGDT 2.0 does not allow for empty nodes. Furthermore, only an extremely careful linguistic analysis would allow distinguishing among instances of referential null arguments, which are assuredly part of verbal valency, instances of non-referential null arguments, whose status is less straightforward, and complete absence of arguments. Example (8) is syntactically represented in the treebank as shown in Figure 11: ēdè ptc phuláxō keep.guard.prs.3pl '… so that I will be able lie in wait for him as he comes and keep guard.' (Od. 4.760) In (8), the future form of lokháō 'lie in wait for' , lokhḗsomai, takes an accusative second argument, min autòn 'him' . In contrast, it is unclear whether the coordinated verb phuláxō is used intransitively and means 'I will keep guard' or takes a null object coreferential with min autòn. This structural ambiguity is not mirrored in the way coordinative structures are represented in the treebank, as shown in Figure 11: all coordinated verbs and their dependents, whether shared by the coordinants or not, are annotated as child nodes of the coordinating element. 10 The problematic issues regarding passive agents, null objects, and others were noted by the creators of the PROIEL family treebanks (Haug & Jøhndal 2008;Haug 2012;Eckhoff & Berdičevskis 2015;Eckhoff et al. 2018a). Although based on a version of Dependency Grammar, the tagset of the PROIEL treebanks includes additional labels and relations to improve descriptiveness, such as the label AG for passive agents and specific relations for elliptic structures.
Journal of Greek Linguistics 21 (2021) 263-297 figure 11 Coordination in AGDT 2.0 A further class of issues arises from the inherent difficulty of interpreting the syntactic status of certain events participants, such as location or origin with posture and motion verbs (cf. hízō 'sit' , e.g., in Il. 9.87). This matter is further complicated by the valency theory of the Functional Generative Description, which treats origin as an optional inner participant. In (9), the origin participant, encoded by ek+gen, is tagged as OBJ in dependence of a verb of breaking, ágnumi 'break': (9) ek out_of dé ptc moi 1sg.dat aukhḕn neck.acc astragálōn vertebra.gen.pl eágē break.aor.3sg.pass 'My neck (lit. 'the neck to me') was broken away from the vertebrae.'  This annotation, however, is problematic for multiple reasons: first, it regards origin as a syntactic argument of ágnumi 'break' , which it is not; second, it treats the initial local particle ek (see, e.g., Zanchi 2019: 82-86 on this terminology) as a preposition governing the genitive astragálōn, which is not necessarily the case either. The latter point is elaborated in Section 4.2, which deals with practical issues related to annotation errors and to specific features of the Homeric language.

4.2
HoDeL: practical issues Like other manually annotated corpora, AGDT 2.0 contains some errors. These may result from annotators' insufficient training, lapses in their attention, their implicit theories of how language works (on implicit linguistic theories, see, e.g., Iannàccaro 2000: 53), and/or methodological choices underlying corpus collection (for a general overview of errors in annotated corpora, see, e.g., Dickinson 2005: 4-27 and references therein). The errors and inconsistencies of AGDT 2.0 have been inevitably inherited by HoDeL. Some examples are highlighted in what follows. A more detailed discussion may be found in Zanchi & Luraghi (2020).
For instance, there may be inconsistencies concerning lemmatization. The AG verb horáō 'see' has a paradigm consisting of three stems, specifically, those in the present horáō, the aorist eîdon, and the perfect ópopa, which are, however, lemmatized as two different entries, the former including the forms based on the themes in horáō and ópopa, and the latter including the forms based on the theme of the aorist eîdon. In contrast, the preverb + verb combinaton eisoráō 'look' , whose paradigm features the same three themes, is lemmatized as a single entry. Given that this issue is not limited to this pair of simplex and complex verbs, resolving this inconsistency would be complex, as it would involve restructuring the underlying lemmatization system of the treebank.
Another issue is the inconsistent tagging of voice information of verbs, which is in some passages tagged according to morphological criteria and in others according to semantic criteria. The HoDeL team has worked to address this issue, which would crucially affect a valency lexicon. In particular, we reannotated voice information in our relational database induced from AGDT 2.0 according to strict morphological criteria. In addition, the lemmas of 185 forms were not annotated in the treebank: we also added this omitted morphological information, directly modifying our relational database.
Other errors may derive from annotators misinterpreting the annotation scheme. For example, the label PNOM should be reserved for predicate nominals of copular verbs. However, in (10), the verb eimí 'be' is not used as a copula, but as a location/existential verb: accordingly, the dependent ein agorêi 'in the place of assembly' should be tagged as OBJ and not as PNOM. athrói in_crowds.nom.pl 'The folk, gathered together, were in the place of assembly. ' (Il. 18.497) A different set of issues stems from peculiarities of the Homeric language. As is widely acknowledged, in Homeric Greek, preverbs could occur in the so-called "tmesis" positions and thus be "split" from the verbs that they semantically modify by various linguistic forms, retaining much of the syntactic freedom of their original adverbial status (cf. Zanchi 2019: ch. 3 with references therein for a diachronic interpretation of this preverb positioning). One of these tmesis positions is shown in example (9) above, which is repeated below in (11)   In (11), the local particle ek occupies the sentence-initial position. In spite of this positioning, in AGDT 2.0, ek is assigned the label AUXP and functions as a head node of the genitive plural astragálōn 'from the vertebrae' . However, the actual syntactic relation holding between ek and astragálōn may well be less rigid, so ek might just as easily function as an adverbial modifier with respect to the relation conveyed by the construction ágnumi+gen. Alternatively, ek could modify the meaning of ágnumi in such a way that this verb takes the prepositionless genitive.
This syntactic ambiguity is reflected in an inconsistent annotation: in Od. 10.559-560, an almost identical formulaic expression occurs (ek dé hoi aukhḕn astragálōn eágē, which differs from (11) only in the dative external possessor), but ek is a verbal dependent tagged as AUXZ. However, not only is AUXZ assigned to local particles in "tmesis" positions, but it is also employed for logical operators that are undoubtedly independent adverbs, such as those meaning 'not' , 'as well' , and 'also' . This analysis is at odds with the function of preverbs, which, by forming a semantic unit with the verb, can occasionally modify its argument structure. Thus, the annotation scheme of AGDT 2.0 inadequately accounts for this peculiarity of the Homeric language. Since Homeric Greek is a language with free word order (thus, "split" preverbs do not always occur in sentence-initial position) and the label AUXZ is ambiguous and frequent, there is no easy, automatic way to find all examples similar to (11). Therefore, to date, solving this problem at the annotation level has been beyond the scope of the HoDeL project.

4.3
Some examples of how to use HoDeL Having explained how HoDeL was built (Section 3.1), its basic functionalities (Section 3.2), and the precise kinds of data it contains (Sections 4.1-4.2), we now can show some examples of how the lexicon can help researchers to operationalize specific research questions. The main advantage of using HoDeL lies in the fact that it allows users to carry out corpus-based quantitative studies on Homeric Greek without learning the complex formalisms necessary to directly query the treebanks of AG. Currently, AGDT 2.0 can be queried online from the web-repository of Universal Dependencies using a language called PML-Tree Query. However, this method has the disadvantage of not allowing specific texts to be singled out from the rest of the treebank: so, for example, Homeric texts cannot be investigated separately from later diachronic varieties of AG.11 To focus on the Homeric texts independently from the rest of the treebank, one has to download the whole treebank in the .xml format, separate the Homeric texts, convert them into another format (e.g, to the .pml format), download a query tool (to my knowledge, the Tree Editor-TrEd is one of the few supporting the .pml format) and finally query the texts using the supported formalism. The other main treebank of AG, the PROIEL, can likewise be queried via Universal Dependencies or via INESS Search (a reimplementation of Tiger Search), but it does not contain Homeric Greek. Thus, HoDeL fills a gap among the available linguistic resources in that it offers an extremely user-friendly interface to perform corpus-based research on the Homeric poems.
To begin with, HoDeL can be used to automatically retrieve all relevant examples of the construction under investigation. For example, the 'Args Order' option ( Figure 8) can be employed to obtain the frequency distribution of sentences attesting to the VSO, SVO, and SOV orders in Homeric Greek. This data could shed light on a number of still-open issues regarding Homeric word order and information structure (on which, cf., e.g., Beschi 2018 with references therein).
The functionality 'Arguments' (Figure 9) can be used to extract all coordinated subjects and objects by selecting the relevant argument relations, specifically, SBJ_CO, OBJ_CO, SBJ_AP_CO, and OBJ_AP_CO (on bridge nodes, see Section 3.1, fn. 6). If the outputs of this filter are cross-checked with those of the 'Args Order' filter, researchers can effortlessly obtain frequency information on positioning patterns of coordinated subjects and objects with respect to verbs: do coordinated elements tend to surface in the same position, be it preverbal or postverbal, or do coordinants tend to be "split" by verbs? How do these ordering patterns correlate with verbal agreement in the case of coordinated subjects? What do these ordering patterns reveal about verbal government of coordinated objects? 11 On the GitHub page of AGLDT 2.0, it is stated that the treebanks can also be queried online via Structural Search and Tündra. Currently, however, neither of the two links given seems to work (http://perseusdl.github.io/treebank_data/; last access: 2021-05-08).
Journal of Greek Linguistics 21 (2021) 263-297 figure 12 Accusative and dative dependents taken by bállō 'throw, hit' Beyond facilitating word order queries, HoDeL can also be useful for detecting passages containing infrequent patterns in the Homeric language, which without HoDeL would require a time-consuming, manual reading of the poems to be detected. For example, by searching in the 'Query' box for a specific verbal lemma with a preverb and combining it with the prepositional phrase headed by the same local particle, one can easily find attested instances of preverb repetition outside the preverbal context. This information can be used to account for the different paces of grammaticalization or lexicalization paths undergone by different AG preverbs (on which, see, e.g., Zanchi 2017): the local particles that allow for repetition are more lexicalized or grammaticalized into preverbs and prepositions.
The option 'Add another argument' can be employed to investigate ditransitive verbs that feature argument structure alternations, such as the transfer verb bállō 'throw, hit' (Figure 12). This verb could mean 'throw something (acc) toward something else / someone (dat)' , as in Il. 1.245-246, or 'hit someone (acc) with something (dat)' , as in Il. 7.11-12. Both passages are shown in Note that in Il. 7.11-12, the instrumental dative is labelled as OBJ, in spite of its uncertain argument status in the domain of syntactic valency (cf. Section 4.1, in which the similar case of ophthalmoîsi 'with (my) eyes' is discussed). Thus, the OBJ tag may well be imprecise from a theoretical standpoint, but this analysis has the welcome advantage that it demonstrates the suitability of HoDeL for this study and similar ones. Indeed, HoDeL is richer than a strictly syntactic valency lexicon and allows investigations into the behavior of event participants whose argument status is controversial, such as those regarded as optional inner complements in the view of the Functional Generative Description.

Conclusions
I have here presented HoDeL, a new linguistic resource intended to ease and refine the researching of Homeric verbs and their dependents. The building of HoDeL was fully documented and framed within the larger picture of morphosyntactically annotated corpora and valency lexica of ancient and modern Indo-European languages and AG. The basic functionalities and incorporated filters of the HoDeL online interface were illustrated, accompanied by suggestions about how to interpret frequency counts. The presentation of HoDeL also provided an opportunity to re-examine a number of methodological challenges, of both a theoretical and a practical nature, that emerged while creating new resources from the existing ones. HoDeL was originally intended to be a syntactic valency lexicon of Homeric Greek. However, due to a theoretical mismatch between the source theory of annotation of AGDT 2.0 and the most widely accepted theories of syntactic valency, it evolved to include a greater number of event participants than those commonly acknowledged as arguments, such as passive agents and instrumental datives. Thus, the data contained in HoDeL is richer than that of a strict syntactic valency lexicon. This paper also addressed some of the limitations of AGDT 2.0. It was noted that the constraints and the tagset of the analytical layer of AGDT 2.0 are not adequate for representing some peculiarities of the Homeric language, including null argument participants and tmesis. For the present, the authors have opted to leave the annotation as it is in this respect, as modifying these features would require massively rethinking and reannotating the treebank. It was also noted that HoDeL inherited a number of errors contained in the treebank, owing to its close connection with the Homeric texts treebanked at AGDT 2.0. Thus far, voice information on verbs and gaps in lemma information have been corrected in HoDeL.
Journal of Greek Linguistics 21 (2021) 263-297 Finally, after explaining to potential users how HoDeL was built, its functionalities, and what data it contains, I have shown how the lexicon can be employed to easily operationalize diverse research questions concerning Homeric verbs and Homeric syntax, and how its user-friendly interface and incorporated filters and queries allow scholars with basic computational skills to perform advanced corpus-based studies on the Homeric language. In addition, I have demonstrated how HoDeL may be used to search morphological information, transliteration and aligned translations of the AG passages, which also greatly facilitate the interpretation of the output results.
In the future, we plan to continue improving the quality of the base data contained in AGDT 2.0, for example, by correcting cases of misuse of the PNOM label. In addition, the HoDeL team is working to link the lexicon with other lexical resources of AG, such as the growing Ancient Greek WordNet . As shown in Zanchi et al. (2021), the enhanced access to data and the extreme user-friendliness of HoDeL can be exploited to integrate sentence frames in the metadata associated with each verbal entry of the Ancient Greek WordNet.