The squaretail mullet Ellochelon vaigiensis (Quoy & Gaimard, 1825) a complex of cryptic species?

The teleost family Mugilidae is speciose with uniform morpho-anatomical characteristics, which render species identification difficult. The dna barcoding technique has, however, proven to be a precise and reliable approach for species delineation. To date, dna barcoding flags numerous polyphyletic species in Mugilidae that probably correspond to species complexes and that call for further taxonomical investigation. Among these species, the squaretail mullet Ellochelon vaigiensis is an interesting case study because, unlike other mullet species, it is easily identified from its unique phenotype. Recent studies of genetic diversity in this monotypic species have revealed two lineages, located either in the Indo-Pacific (Polynesia and Taiwan waters) or along Australian shores. In this study, a third lineage is described in the North of the Indian Ocean, based on nucleotide polymorphisms


Introduction
The Mugilidae family is a taxonomically diverse group of fishes that inhabits all tropical, subtropical, and temperate water bodies of the planet. At a regional scale species diversity is more limited and, based on morphometry, eight species of mullet have been reported from the Persian Gulf and Oman Sea (Firouzi et al., 2020). Among these species, the squaretail mullet Ellochelon vaigiensis is the only representative of the genus Ellochelon, which belongs to the Squalomugilini with two other monotypic genera, namely Plicomugil and Squalomugil (Xia et al., 2016). It is distinguishable from other mullets by various morphological criteria. These include a broad head, lack of adipose eyelid, truncate caudal fin, pectoral fins without axillary scale that are black in juveniles but have a yellow lower margin in adults, other fins dusky, absence of shelf-like folds inside mouth corners, presence of labial teeth, less than 28 weakly ctenoid and relatively large scales in lateral series, and distinct dark longitudinal strips on lateral scales (Ghasemzadeh, 1998;Harrison & Senou, 1999;Xia et al., 2016).
The species is found in shallow waters (0-5 meters) in lagoons, estuaries, sheltered sandy shorelines and coastal creeks, and will occasionally enter freshwater (McDowall, 1997;Whitfield et al., 2012). Its distribution range encompasses all East African shores north to the Persian Gulf, east to Marshall, Gambier and the Marquesas Islands, north to southern Japan, south to Western Australia, New South Wales (Australia), New Caledonia, Society Islands, and Rapa the Great Barrier Reef (McDowall, 1997;Harrison & Senou, 1999;Fricke et al., 2020). Over this geographically large distribution, up to ten species have been described and, later, synonymized with Ellochelon vaigiensis (Thomson, 1997;Motomura et al., 2010;González-Castro & Ghasemzadeh, 2015). Durand et al. (2012) used mitochondrial sequence polymorphisms, to demonstrate, however, that this species is actually polyphyletic, and revealed two lineages. Considering the degree of divergence (4.8%) between the two lineages, which largely exceeds the accepted level of intraspecific diversity (see Durand et al., 2017), alongside the geographic distribution of the two lineages, Durand and Borsa (2015) suggested conserving the species name for the lineage close to the type locality and provisionally naming the lineage in Australian waters as Ellochelon sp. A. In this context, it is important to increase dna barcoding efforts on squaretail mullet, analyzing more specimens from various localities over its known distribution range in tandem with studies of morpho-anatomy. Using an integrative taxonomy approach, we propose to 1. genetically characterize new specimens from the Persian Gulf and the Oman Sea where the species have only been investigated morphologically (Teimouri & Hesni, 2020;Firouzi et al., 2020); 2. determine their phylogenetic relationships with the existing two lineages, and 3. record morpho-anatomical characters used to establish the Mugilidae taxonomy for all holotypes and some representative specimens belonging to the different mitochondrial lineages.

Field sampling
Ten specimens of Ellochelon vaigiensis were collected from the Persian Gulf (Bandar Abbas; 27°07'N 56°21'E) and Northern Oman Sea (Gwadar Bay; 25°10'N; 61°29'E; fig. 1) by gillnet and beach seine ( fig. 2). The fin clips were fixed in 96% ethanol for molecular studies, and the specimens were transferred to 4% formalin for morphological analysis. Specimens were cataloged in the Aquatic Animal Collection of Tarbiat Modares University (Codes: TAC1066F & TAC1066F for the Persian Gulf specimens and TAC1224F, for the specimens from the Oman Sea. dna extraction and sequencing dna extraction was carried out by the phenol-chloroform method (Taggart et al., 1992). A Cytochrome c oxidase 1 (coi) fragment of 652 base pairs (bp) and a Cytochrome b (Cytb) fragment of 937 bp was amplified using FishF1/FishR1 (Ward et al., 2005) and GluF/ ThrR (Machordom & Doadrio, 2001) primers, respectively.
The 40 μl pcr reaction mixes included 20 μl of MyTaq pcr Mastermix (Bioline), 16 μl of ultrapure water, 0.8 μl of bsa (Euromedex), 0.6 μl of each primer (3 μM), and 2 μl of dna template. Amplifications were performed in a ptc-100 biorad Thermal cycler. The thermal regime for the coi gene fragment consisted of an initial step of 2 min at 92°C followed by 35 cycles of 45 s at 92°C, 45 s at 52°C, and 1 min at 72°C, followed in turn by 5 min at 72°C and then held at 4°C. For the Cyt-b gene fragment, the thermal regime consisted of an initial step of 2 min at 94°C followed by 35 cycles of 45 s at 94°C, 1 min at 52°C, and 1 min at 72°C, followed in turn by 7 min at 72°C and then held at 4°C. pcr products were visualized on 1-2% agarose gels and the most intense products were selected for sequencing by Macrogen (https://dna.macrogen.com). The quality of sequences (base miscoding) was controlled visually using Chromas software (www.technelysium.com.au/chromas.html).

Morphological analyses
Twenty four morphometric and eight meristic characters (table 1) were measured by caliper to an accuracy of 0.02 mm and counted using a stereomicroscope (Lagler, 1956;Alavi-Yeganeh & Bahmani, 2018), on 10 specimens collected from the Persian Gulf and Oman Sea, and on 24 specimens from the Indo-Pacific: ams I.

Phylogenetic analyses
The dataset used for the phylogenetic investigation consisted of three coi and three Cyt-b sequences from this study which were uploaded to GenBank (MT843281-3 and MT882672-4, respectively), with all sequences stored in the bold dna barcodes library (http://boldsystems.org, consultation on Nov. 2021) associated with the name "Ellochelon vaigiensis" and belonging to three Barcode Index Numbers (bin s) bold: AAC9398, bold: AAU0553, bold: ACK7668 and five available congener Cyt-b sequence from the GenBank were extracted (table 2). A fourth bin (bold: ABY5947) is associated with the name "Ellochelon vaigiensis" but was not considered in our final dataset as only one member of the 20 members is identified as Ellochelon vaigiensis, and certainly corresponds to a misidentification considering   (Kumar et al., 2016). Also, the genetic distance was estimated with the Kimura-2parameter (K2P) model by mega 7 (Kumar et al., 2016). The best substitution model for our datasets was estimated using the method implemented in jModeltest Ver. 2.1.10 (Darriba et al., 2012) and considering the Akaike information criterion (aic) score. The phylogenetic tree was constructed using maximum likelihood (raxmlGUI; Edler et al., 2021) and Bayesian likelihood (Mr. Bayes; Ronquist & Huelsenbeck, 2003) methods. The Hasegawa, Kishino and Yano with Gamma distribution (hky + G) and generalized time-reversible with Gamma distribution (gtr + G) substitution models were used for the construction of coi and Cyt-b trees, respectively. Initial trees for the heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (mcl) approach and then selecting the topology with a superior log likelihood value. Codon positions included were 1st+2nd+3rd. All positions containing missing data were eliminated. A bootstrap test with 500 replications was performed for the ml (Maximum Likelihood) tree and two simultaneous runs with four mcmc chains were used for 107 generations in bl (Bayesian Likelihood) analysis. Subsampling trees and parameters were saved every 1000 generations A sequence of Plicomugil labiosus (JQ060620) was used as an outgroup due to its phylogenetic proximity with Ellochelon vaigiensis  for the coi tree. Also, for the Cyt-b tree, a sequence from Plicomugil labiosus (KF375164) was used as an outgroup. Three dna sequences-based methods of species delimitation including Automatic Barcode Gap Discovery (abgd) (https://bioinfo.mnhn.fr/abi/public  (Zhang et al., 2013) and Refined Single Linkage (resl) (Ratnasingham & Hebert, 2013), were used as species delimitation tools for coi dataset.

Results
None of the morphometric and meristic characters was diagnostic and all had overlapped ranges (table 1). Despite these overlaps in ranges, multivariate analysis (pca) revealed that pectoral fin length, first dorsal fin height, snout length, postorbital length, anal fin length, body width and a meristic character (pyloric caeca number) were the most effective morphometric characters to discriminate between specimens from Indo-Pacific and Northern Indian Ocean ( fig. 3, table 3). Alignment of coi gene fragments over 551 bp for 42 specimens identified as Ellochelon vaigiensis revealed 49 variables and 45 parsimony-informative substitutions. The phylogenetic tree highlighted three lineages in agreement with bin delineated by the resl algorithm (Ratnasingham & Hebert, 2013). Three clades supported strongly with bootstrap test (>95) and posterior probabilities values (>0.99). All specimens collected in the Persian Gulf and the Oman Sea belong to the bin bold: AAU0553 ( fig. 4). The two other bin s included specimens used in the phylogenetic investigation of Durand et al. (2012). The genetic distance between E. vaigiensis bin ranges between 5.5% to 6.1%. IntraBIN mean divergences ranged between 0.1% to 0.2% (K2P). None of the haplotypes belonging to these bin were separated by more than two mutations, with the exception of one haplotype observed in a single specimen from Indonesia (BIFZI665-17) of the bin bold: AAC9398 presents seven mutations to the second haplotype shared by all specimens of this lineage. Using abgd or ptp, this divergent haplotype is considered a Molecular Operational Taxonomic Unit (motu). This difference is the only difference between resl and the two other species delimitations models ( fig. 4). The geographic distribution of the three bin s is only partially overlapping. The bin bold: AAC9398 is present in a specimen collected in the Pacific from Taiwan to Lizard Island (Australia) and from Indonesia to French Polynesia. The bin bold: ACK7668 is present in a specimen collected in Australia, while the bin bold: AAU0553 is observed in the Indian Ocean from the Persian Gulf to Indonesia (Lombok) (fig. 4). Similarly, a phylogenetic tree based on Cyt-b sequence (937 bp) revealed three distinct lineages from southern waters of Iran (lineage I), Australia figure 3 A scatterplot for pca based on morphometric and meristic characters of two Ellochelon vaigiensis groups from the northern Indian Ocean (square) and Indo-Pacific (circle) areas.  5).

Discussion
In this study, haplotypes of the nominal species Ellochelon vaigiensis were structured into three independent evolutionary lineages, revealing a novel third lineage in the Indian Ocean. Among the lineages, two were already present in the Mugilidae phylogeny produced by Durand et al. (2012) using an extremely limited sample size; 3 individuals were collected in Australia, Indonesia, and French Polynesia.
Recently, a coi haplotype from specimens morphologically identified as Ellochelon vaigensis was reported from the Pakistan coast, figure 4 Phylogenetic diversity of Ellochelon vaigiensis based on coi gene sequence data. In (A), the phylogenetic relationships were inferred by using the bl & ml method. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Bootstrap percentage values for ml and posterior probability for bl analysis are indicated top and under the nodes, respectively. Haplotypes grouping by abgd, ptp, and resl delimitation models are presented with different colors. In (B), a distribution map of locations for three main clade haplotypes is provided. Haplotypes in bold format are reference sequences published in Durand and Borsa (2015).
which was closely related to specimens assigned by bold to the bin bold: AAU0553 (BIFB328-13 from Indonesia) (Hasan et al., 2021) and concordant with the result of this study that identified more specimens belonging to this bin ( fig. 4). Geographic distributions of the three lineages have been established using additional samples from the Oman Sea and the Persian Gulf, plus dna barcodes available in the bold system. This has revealed a clear phylogeographic pattern, with the three lineages that rapidly diverged from a common ancestor with an apparently parapatric distribution. The suture zone among these different lineages is located in the South of the coral triangle (in Indonesia and North Australia) which suggests a Sunda Shelf (the Indo-Malayan region) origin. This part of the world is known to have a complex geologic history and harbors a hotspot of marine biodiversity (Tornabene et al., 2015). During the Pleistocene era, this area experienced repeated large-scale eustatic sealevel movements that directly impacted shorelines and coral reefs, and thus the biological connectivity of marine organisms inhabiting it (Voris, 2000;Carpenter et al., 2011;Ludt & Rocha, 2015). Biogeographic barriers created by sea-level lowering as well as the drastic reduction of habitable shelf area would have prevented gene flow and favored genetic population bottlenecks in Ellochelon vaigiensis. This process would explain the low intra-lineage genetic diversity as well as the strong lineage sorting. This phylogeographic signature is frequently observed in marine Indo-Pacific teleost species such as Dascyllus trimaculatus, Decapterus russelli, Epinephelus areolatus, Epinephelus fasciatus (for a review, Carpenter et al., 2011;Ludt & Rocha, 2015, Durand et al., 2020, which mirror the magnitude of these geological events in their genetic architecture. Curiously, in the case of Ellochelon vaigiensis this process led to three lineages, suggesting that sea-level lowering probably created barriers that led to, at least, three refugia. It remains to be investigated whether the genetic isolation was long enough, or evolutionary forces strong enough, to achieve full speciation in Ellochelon vaigiensis. Considering the level of divergence observed among these lineages, close to or higher than 6% (using the partial coi gene sequence variation), Durand and Borsa (2015) questioned the taxonomic status of this species as well as other polyphyletic   (Zemlak et al., 2009) and barcoding gap delineation has been proposed as a method for preliminary identification (Puillandre et al., 2012;Ratnasingham & Hebert, 2013).
In their review of fish dna barcodes generated during the fish-bol program, Zemlak et al. (2009) noted that 93% of intraspecific divergences were <1%. Specifically, for the Mugilidae family, Durand et al. (2017) also noted that the distribution of pairwise nucleotide distances based on coi barcodes exhibited a first peak <1% that corresponded to the nucleotidic diversity observed inside motu s recovered with the abgd method (Puillandre et al., 2012). This observation led Durand and Borsa (2015) to propose an interim taxonomic nomenclature for Mugilidae where each genetic lineage (motu) is acknowledged and considered as a candidate species. Implicitly, this interim taxonomic nomenclature based on dna barcode gaps called for further morpho-anatomical investigation, since genetic characters are useless for fish applied taxonomy that relies on morphological and meristic differences. Despite this conclusion, few taxonomic investigations of Mugilid diversity have adopted an integrative approach. A review of Mugil species inhabiting South America highlighted that, in Mugil sp. N sensu Durand and Borsa (2015), there was a tenuous series of meristic characteristics justifying the creation of the species Mugil margaritae by Menezes et al. (2015). More recently, Thieme et al. (2022) demonstrated, both genetically and morphologically, that Chelon sp. A, which was described in South Africa (Durand & Borsa, 2015), was in fact Chelon persicus as described originally in the Persian Gulf (Senou et al., 1996). Specimens collected from the northern Indian Ocean did not show any diagnostic morphometric differences from specimens from the Indo-Pacific, although multivariate analysis highlighted two groups of individuals consistent with the genetic delimitation. Similarly, Annisa et al. (2021) compared two populations of E. vaigiensis from southern Indonesia (Arafura Sea) and a population from northern Indonesia (Pacific Ocean) morphologically. They did not find any significant morphological differences but multivariate analysis suggested the possible presence of cryptic species in E. vaigiensis. That study also emphasized the need for molecular approaches to resolve the taxonomic status.
Integrative taxonomy is not, however, a guarantee of species concept reconciliation (morphology vs. genetics), as demonstrated in the present study where no morphometric or meristic differentiation was highlighted among Ellochelon vaigiensis lineages. Morphological conservatism in the Mugilidae has frequently been highlighted and explains why the nomenclature of the family is still largely debated, even at the genus level (for reviews, see Durand, 2016;Xia et al., 2016). In such a situation, it is not surprising to find cryptic species in nominal species such as Ellochelon vaigiensis. While that does not help applied taxonomy, an interim taxonomy is still necessary because reliable and relevant for conservation or evolutionary investigations must consider all biodiversity components, especially when a strong biogeographic pattern is identified. For the sake of standardization, the genus name + the bin should be used to name lineages (otu) in a species complex; the species name being conserved for the lineage observed at the type locality and if only one lineage is present. In the present situation, the name Ellochelon vaigiensis may be maintained only for specimens belonging to the bin bold: AAC9398, while the two other species may be named Ellochelon [bold: ACK7668] and Ellochelon [bold: AAU0553] for the species present in Australia and the Indian Ocean, respectively.