CRUSTACEAN VITELLOGENIN: A SYSTEMATIC AND EXPERIMENTAL ANALYSIS OF THEIR GENES, GENOMES, MRNAS AND PROTEINS; AND PERSPECTIVE TO NEXT GENERATION SEQUENCING

Crustacean vitellogenesis is a process that involves Vitellin, produced via endoproteolysis of its precursor, which is designated as Vitellogenin (Vtg). The Vtg gene, mRNA and protein regulation involve several environmental factors and physiological processes, including gonadal maturation and moult stages, among others. Once the Vtg gene, mRNAs and protein are obtained, it is possible to establish the relationship between the elements that participate in their regulation, which could either be species-speciﬁc, or tissue-speciﬁc. This work is a systematic analysis that compares the similarities and differences of Vtg genes, mRNA and Vtg between the crustacean species reported in databases with respect to that obtained from the transcriptome of Callinectes arcuatus , C. toxotes , Penaeus stylirostris and P. vannamei obtained with MiSeq sequencing technology from Illumina. Those analyses conﬁrm that the Vtg obtained from selected species will serve to understand the process of vitellogenesis in crustaceans that is important for ﬁsheries and aquaculture. , ovary development related protein , ovigerous-hair stripping substance , ovoinhibidor , ovarian killer protein , ecdysteroid receptor , ecdysteroid-regulated protein , ecdysone-induced protein , voltage-dependent calcium channel , FEM-1 , and insulin growth factor , among others.


INTRODUCTION
Crustacea have different sexual systems, each aimed at achieving maximum reproduction efficiency in their specific situation. In oviparous species, ovary maturation comprises the synthesis of yolk protein denoted as Vitellin (Vn), which is the most important source of energy and nutrients for embryo development (Boulangé-Lecomte et al., 2017). Vn is produced through the endoproteolysis of its precursor Vitellogenin (Vtg), and both proteins are immunologically similar Zmora et al., 2007;Xie et al., 2009). The process of Vtg biosynthesis and its accumulation in the ovary is named "vitellogenesis", and it is essential for ovarian maturation (Thongda et al., 2015). According to species, Vtg biosynthesis occurs in the ovary and/or hepatopancreas .
Before the 1990s, most studies focused on ovarian Vtg synthesis; thereafter, crustacean extra-ovarian Vtg synthesis has been well analysed (Jayasankar et al., 2002;. Among species, there are differences between ovarian and extra-ovarian Vtg synthesis (Phiriyangkul et al., 2007). Some proposals to explain this pattern have been established, including applied methodologies, molecular concentration, source of the organisms, among others. The procedures to establish Vtg mRNA concentration in any physiological condition request a correct hybridization between mRNA and the antisense molecule, such as oligonucleotides, complementary DNA (cDNA), or an adequate primer to DNA polymerization. Therefore, the Vtg gen, cDNA and protein sequence are fundamental for a molecular probe design to quantify the mRNA from the genes' expression that participate in crustacean vitellogenesis.
Nowadays, to obtain DNA or RNA sequences quickly in a specific physiological condition, Next Generation Sequencing (NGS) technologies are the better option. They often produce in the order of thousands or hundreds of thousands of sequences, in shorter times and at significantly lower costs (Jimenez-Gutierrez et al., 2016). NGS has been performed in several species of crustaceans; some works have focused on reproduction-involved tissues (He et al., 2012;Gao et al., 2014;Shen et al., 2014), and few of them have yielded physiological implications (Tarrant et al., 2014;Peng et al., 2015;Das et al., 2018;Uengwetwanit et al., 2018;Wang et al., 2019). In this way, since NGS technologies give new options to identify genes and mRNAs, among which the various Vtg transcripts Tarrant et al., 2014), NGS is another option to establish the Vtg mRNA expression between maturity stages and sexes , whose sequences are published in public data bases such as GenBank.
From the available information, most of the Vtg mRNA characterizations from ovary or hepatopancreas were carried out under laboratory conditions. Even though cultured species do not face dramatic climatic changes, many metabolic pathways are circadian-rhythm dependent. In both wild and cultured organisms, many clock genes are strongly related with ovarian development (Tarrant et al., 2014;. Therefore, there currently exists a lack of integration of Vtg mRNAs from crustacean species from fishery, as well as information integration among the different species of crustaceans.
In this work, Vtg mRNAs from the most important crustacean species for the Eastern Pacific fishery (i.e., Callinectes arcuatus Ordway, 1863, C. toxotes Ordway, 1863and Penaeus stylirostris Stimpson, 1871, and cultured Penaeus vannamei Boone, 1931, were determined with MiSeq Illumina Sequencing Technology. This work sought to deepen the understanding of the different molecules involved in crustacean vitellogenesis, for their use in the evaluation of reproductive periods, and their regulation in different crustacean species of commercial importance.

MATERIAL AND METHODS
Crustacean genomes, Vtg genes mRNA and Vtg identification in the National Center for Biotechnology Information (NCBI) To identify the crustacean genomes in the Genome Database from NCBI, the keyword "crustacean" was used. Then, the Vtg gene sequences were identified using the keyword "Vitellogenin" in Genomedata Base whereas the mRNAs were determined from the "GenBank Database" using the same keywords (Benson et al., 2005). To construct the Vtg and Vtg crustacean database, the complete open reading frame (ORF) sequences were used, and partial sequences were discarded.
To corroborate the complete Vtg genes mRNAs and protein sequences reported in the respective databases, a Basic Local Alignment Search Tool (BLAST; Altschul et al., 1990) in the NCBI database was made with an Expected Threshold of 100, a Match/Mismatch of 1-2 and 20 000 Max Target Sequences. Furthermore, a multiple alignment among Vtg Selected was performed with software Kalign 2.0 (Lopez, 2008;Lassmann et al., 2009;Chojnacki et al., 2017). The alignment parameters were similar to those proposed by the lost DNA model (Martínez-Pérez et al., 2005): output format Clustal W, gap open penalty of 9.0, gap extension penalty 0.2, terminal gap penalties 0.45, and bonus score 0.0.
The identification of the introns and exons of the reported Vtg genes was done according to the reported GenBank sequence and from the respective paper. Furthermore, data from mRNA and ORF of each Vtg sequence were corroborated with the alignment among the genomic sequence with the aforementioned parameters. The codons of each exon were established with corresponding amino acids obtained from the ORF translation with the software EXPASY Translate tool (SIB, 2016).
Vitellogenin mRNA phylogenetic tree, and Vtg domains Determination of the phylogenetic relationships among the sequences of the Vtg mRNA was made by Bayesian inference with Mr Bayes software (Ronquist et al., 2012), in 2 runs with 4 Markov-Monte Carlo chains and with a maximum of 30 million generations and sampling every 3000 generations. The Vtg domain identification from each species was done following the protein report from GenBank.

Animal collection
Wild animals including Callinectes arcuatus, C. toxotes and Penaeus stylirostris were obtained from a fishing boat in the East Pacific (23°20 N 106°30 W), while cultured P. vannamei were obtained from an aquaculture farm in Mazatlán, Sinaloa, Mexico (23°1 N 106°12 W). From the wild animals, a stock from both Vtg synthesis tissues (ovary and hepatopancreas) was used to obtain the transcriptome that represents all maturity stages, the capture season and the circadian rhythm for each species. For the cultured organisms, a stock from each Vtg synthesis tissue (ovary and hepatopancreas), was used to obtain the transcriptome that represents all maturity stages and the circadian rhythm. The details are indicated in table I.

RNA isolation and illumina sequencing
Total RNA from each tissue stock was obtained with the following protocol: Total RNA was isolated from 100-150 mg of the tissue stock with the Pure Link RNA Mini Kit (Invitrogen / Thermo Fisher Scientific, Waltham, MA) following the manufacturer's instructions and resuspended in 90 μl free RNase water. A second round of purification was conducted as follows: a volume of Trizol reagent and 40 μl of chloroform were added, then the mixture was vortexed for 10 s and incubated for 10 min. at 4°C. The phases were obtained by centrifugation at 13 000 g for 45 min. at 4°C to obtain the supernatant. The RNA was precipitated with 90 μl of isopropyl alcohol and 10 μl of high salt buffer (0.8 M sodium citrate and 1.2 M NaCl) and then incubated over night at −20°C. The RNA was Ovarian and hepatopancreas transcriptomic illumina sequencing All RNA samples were submitted to Genoma Mayor, Universidad Mayor in Chile (Santiago de Chile). The RNA concentration of each sample was determined with QuantiFluor ® dsDNA System (Promega, Madison, WI) and the Integrity with Bioanalyzer 2100 RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA). The library construction was done using TruSeq Stranded mRNA (Illumina, San Diego, CA). The purity library and size fragments were determined as previously described for total RNA extraction, and the library was sequenced in Illumina MiSeq instrument according to the manufacturer's procedure.

De novo assembly and Vtg analysis
To obtain each transcriptome, the adapter sequences from each read and lowquality reads were first eliminated with the software Trim-galore, from http://www. bioinformatics.babraham.ac.uk/projects/trim_galore/. The normalization was done with Trinity version 2.6.6 (Haas et al., 2013) with the function: <insilico_read_normalization.pl>. The de novo assembled was carried out with the software SPAdes version 3.12.0 at http://cab.spbu.ru/files/release3.12.0/. Finally, to establish the correlation between the sequences assembled and their function, a BLAST alignment was done with software Diamond version 0.9.22 versus the "Nucleotide collection" database from NCBI. From these results, the putative Vtg sequences of each species were corroborated with a second BLAST alignment.

RESULTS
Until November 2018, 27 crustacean genome projects had been reported by the NCBI (see Supplementary Material A-I). From these, six corresponded to the class Branchiopoda, mainly comprising species of the families Triopsidae and Daphniidae. In contrast, the genome of eight of the nine families of the class Hexanauplia had been sequenced. The rest of the genomic projects reported in the NCBI corresponded to species of Malacostraca; however, the majority of these were still under construction.
From the available information, two main Vtg genes had been determined. Both genes, Vtg1 and Vtg2, from Daphnia magna Straus, 1820 have 16 introns and 17 exons, whereas in species of Decapoda, such as Metapenaeus ensis (De Haan, 1844), and Scylla paramamosain Estampador, 1950, the Vtg1 gene has 14 introns and 15 exons, and the Vtg2 gene has fewer than 12 introns and 13 exons (Supplementary Material A-II). The length of the nucleotide sequence of Vtg in the GenBank database ranges from 5000 to 6000 bp in classes such as Branchiopoda and Hexanauplia; meanwhile, in the class Malacostraca, the Vtg sequences range from 7782 to 8518 bp. Those genes code for Vtgs varying from 2534 to 2592 amino acids in length (Supplementary Material A-III).
The phylogenetic relationships determined based on nucleotide Vtg sequences of crustacean species showed, that the analysed Decapoda share a common ancestor ( fig. 1). There were three main clades: one from the suborder Dendrobranchiata, another for the infraorder Caridea, and the third for the infraorder Brachyura. The evaluated sequences were not separated by tissue of origin, but were catalogued by the species to which they belong. Despite some previously reported transcriptomes that show up to 20 transcripts per species that encode Vtg (Supplementary Material A-IV), in some species, different results with respect to Vtg expression had been reported depending upon the methodology applied (Supplementary Material A-V).
On international databases, the reported Vtg molecular weights range from 200 to 500 kDa, according to the species. Vtg from the classes Branchiopoda and Hexanauplia contain less than those from the class Malacostraca. Vtg has conserved regions among the species, especially in their amino terminus, where the Vitellogenin-N domain is located. In this sense, seven domains have been found in species of crustaceans, at least three of which are found in all crustacean species: the domains Vitellogenin-N, Von Willebrand factor type D (VWD), and the    In the specific case of previously reported P. vannamei, the partial Vtg sequence was from the amino-terminal region, which is the most conserved region. In our results, the partial ovary Vtg sequence carboxyl-terminal region has a similarity of 69% with hepatopancreas Vtg. In contrast, for C. quadricarinatus and M. ensis, the partial sequences were from the carboxyl-terminal region, which is why the similarity of the sequences that could be checked drops to 54% and 42%, respectively (table II). Transcriptomes from cultured P. vannamei and wild Penaeus stylirostris, Callinectes arcuatus and C. toxotes allow to identify the Vtg sequences and their deduced proteins (table III). The P. vannamei complete Vtg from hepatopancreas shares 69% similarity with partial carboxyl-terminal ovary Vtg. In general, all Vtg sequences presented identities greater than 90% with those from previously reported species, including other species of the same genus. The effect of neuropeptides and physiological conditions on Vtg expression and synthesis in crustaceans is illustrated in Supplementary Material A-VII.  and exons from crustacean species are like those from other invertebrate and vertebrate species, suggesting a common ancestor (Kung et al., 2004). Nevertheless, the higher number of introns of Branchiopoda with respect to Malacostraca is directly related to their lower number of base pairs on Vtg sequences and to the higher specialization of Vtg of Malacostraca species.
Despite the presence of at least two Vtg sequences reported for several species, in the phylogenetic tree, the sequences are not separated according to the tissue of origin but according to species. This suggests a possible alternative splicing and punctual mutations within each species (Tarrant et al., 2014;Liu et al., 2015). Some authors have suggested a divergent evolutionary process (Tarrant et al., 2014;Liu et al., 2015).
The Vtg transcript of vertebrates is encoded by a multigene family (Tarrant et al., 2014). But in the invertebrates, only for Procambarus clarkii (Girard, 1852), 29 transcripts were reported that encode Vtg . Our results from each tissue, which agree with previous reports, suggest possible alternative mRNA splicing events (Mak et al., 2005) or an early gene duplication event, followed by rapid sequence divergence (Phiriyangkul et al., 2007), which represent phenomena that are not limited to the infraorder group and neither by their geographical distribution.
Most works have focused on differential tissue expression of Vtg in female crustaceans. For example, in the shrimp Metapenaeus ensis, Vtg1 is expressed in ovary and hepatopancreas, whereas Vtg2 is expressed exclusively in the hepatopancreas (Wong et al., 2008). However, a main point of discussion is which organ is the major site of synthesis. A major Vtg expression in the hepatopancreas has been reported for some crustacean species, such as the shrimps Penaeus japonicus Spence Bate, 1888 (cf. Okumura, 2007), P. merguiensis (cf. Phiriyangkul et al., 2007), P. vannamei (cf. Raviv et al., 2006) and M. ensis (cf. Tsang et al., 2003;Tiu et al., 2006a), the freshwater crayfish Procambarus clarkii, and the blue swimming crab Callinectes sapidus (cf. Shen et al., 2014;Thongda et al., 2015). Nonetheless, the results depend upon the methodologies applied.
With genome databases, Vtg genes, mRNA and Vtg proteins, in this work, the Vtg is confirmed in ovary/hepatopancreas transcriptomes from the wild-caught crustaceans C. arcuatus, C. toxotes and P. stylirostris in addition to the ovary and hepatopancreas transcriptomes from cultured P. vannamei. Even so, there are some previously published transcriptomes from P. vannamei with Illumina MiSeq technology, most of which do not have a reproductive focus (Zhang et al., 2016;Fan et al., 2019), and some of them are focused on eyestalk tissue (Wang et al., 2019) and changes in hepatopancreas after eyestalk ablation . Next to Vtg, some other mRNAs have been implicated in reproduction control, such as the Vtg receptor (VtgR; Shen et al., 2014), the Gonadotropin-Releasing Hormone (GnRH) signalling pathway (among them: GnRH receptor and epidermal growth factor receptor), Torso-like, and Vigillin, among others (Tarrant et al., 2014;Uengwetwanit et al., 2018).
Most of these could serve as molecular markers of maturity stages, as well as to study the interactions in the induction and repression of reproduction. However, further studies are necessary to understand the role of each one on the reproduction of crustaceans.
Despite the substantial amount of information generated from NGS, there remains a large number of unknown genes and functions, because organisms have phenotypic plasticity, i.e., the ability to express different phenotypes from the same genotype due to changes in environmental conditions . Also, the complete genome of most crustaceans is unknown, and there is a limited number of non-redundant sequences in the international databases for some species.

Vtg synthesis
Vtg is a member of a family of lipid transfer proteins (Smolenaars et al., 2007). The classes Branchiopoda and Hexanauplia have fewer amino acids as compared to the class Malacostraca. The fact that Malacostraca species present more domains implies a higher level of regulation and more possible functions for Vtg. The Vitellogenin-N domain is the core of the protein, and transfers cholesterol and triglycerides to Vtg . The Ferritin-like domain stores iron in a biologically available form. The GL/ICG motifs from the C8 and VWD domains are necessary for the oligomerization of the protein Wu et al., 2018). In addition to these domains from crustacean Vtg, some motifs, such as R-K.XXR, KLSR, KCYR, and KFSR, are found in mammals, insects, and crustaceans (Xie et al., 2009). These have been proposed as processing motifs for the subtilisin-like protease family (Tseng et al., 2001).
Vtg is composed of two subunits in the early stages of vitellogenesis and of four subunits in the late stages of vitellogenesis, which is congruent with Western blot immunopositive signals concentrated on the lower-molecular-weight fraction of the ovarian polypeptide (Okuno et al., 2002;Zmora et al., 2007). Vtg subunits have been detected by ELISA assays in the ovary, haemolymph, and hepatopancreas; however, in some crustacean species, Vtg was not found in the hepatopancreas, despite the presence of the Vtg transcript (Auttarat et al., 2006). In some species, Vtg levels in the hepatopancreas at any stage are low or undetectable by Western blot (Auttarat et al., 2006;Phiriyangkul et al., 2007).
This suggests that the demand for Vtg from the ovary is greater than the rate of synthesis from the hepatopancreas (Auttarat et al., 2006), whereas in the ovaries the concentrations of Vtg during ovarian maturation are lower than those of Vn (Auttarat et al., 2006;Wong et al., 2008). This is likely because Vtg may be excreted from the hepatopancreas immediately after synthesis and/or due to the great number of proteases already present in this organ Zmora et al., 2009). In some cases, the presence of small Vtg peptides depends upon the accurate preservation of the sample, because even under a congelation point, RNases and proteases have activity (Auttarat et al., 2006).

Ovary development and maturation
Regardless of the presence of one or two Vtg transcripts, their expression is strictly related to the oogenic cycle (Raviv et al., 2006), and Vtg levels in the haemolymph are often indicative of ovarian development (Thongda et al., 2015). For most crustacean species, ovarian development is separated into four stages (Nguyen et al., 2018) and external conditions are intimately connected with ovarian maturation, depending mostly on the season of the year, with maximum reproduction peaks in seasons with higher temperatures (Thongda et al., 2015).
Previous reports show a species-specific Vtg expression pattern. For most crustaceans, both Vtg transcription and yolk volume increase in parallel to ovarian maturation (Kung et al., 2004). For the shrimps Penaeus merguiensis, P. indicus H. Milne Edwards, 1837 and P. vannamei, the Vtg expression levels are higher in the ovary than in the hepatopancreas at all evaluated stages Raviv et al., 2006), whereas for the shrimp P. japonicus, the expression pattern is the opposite . The highest Vtg mRNA levels from the ovarian tissue of P. merguiensis were observed in the early vitellogenic stage, and these decrease in advanced stages, with an opposite expression pattern in the hepatopancreas (Phiriyangkul et al., 2007). However, in some reports for P. japonicus (cf. Okumura, 2007), and Scylla paramamosain (cf. Jia et al., 2013), a constant increase occurs from the primary vitellogenetic stage to final maturation, with a decrease after oviposition.
In some species of decapods, such as the mud crab S. paramamosain (cf. Gong et al., 2015) and the prawn Macrobrachium rosenbergii (De Man, 1879) (cf. Okuno et al., 2002), the ovary is only known as a site for Vtg uptake and accumulation during ovarian development , synthesizing only small amounts of Vtg (Tiu et al., 2006b). In other species, such as the ridgeback prawn Sicyonia ingentis (Burkenroad, 1938), the shrimp P. japonicus and the swimming crab Callinectes sapidus (cf. Tsukimura, 2001;Okumura, 2007;Thongda et al., 2007;Zmora et al., 2007), Vtg synthesis occurs in both tissues. We have found Vtg transcripts in both tissues from adult females of the Pacific blue swimming crab C. arcuatus and also in the hepatopancreas of subadult females of P. stylirostris (without developed gonads), whereas in subadult females of P. vannamei, Vtg transcripts were absent.
It is important to highlight these physiological differences and the speciesspecific regulation level, because in tropical countries, fishery management is generalized for all members of the same crustacean family (e.g., in the families Peneidae and Portunidae; NOM-039-PESC, 2003; NOM-002-SAG/PESC, 2013), despite differences in the regulation of their reproduction. In this sense, Vtg sequences for species that are not cultured commercially, such as the shrimp P. stylirostris and the crabs C. arcuatus and C. toxotes, among others, are less studied and were absent in the GenBank databases until this work.
All of the abovementioned studies suggest a perfectly coordinated process among the hepatopancreas and ovary, where the genes Vtg, VtgR and EcR, among others, could be potential markers for evaluating ovarian maturation in each species. However, Vtg expression and Vtg synthesis patterns are not the same for all infraorders from the subphylum Crustacea, not even among members of the same genus. Evaluations must be standardized by each species, and by each habitat, without neglecting the evolutionary history of each gene. An understanding of all of these physiological processes can be used to assay the crustacean reproduction process in both aquaculture and fisheries. Vtg Gen, name assigned to the Vitellogenin gene; Gen bp, size in base pairs of genes described for Vitellogenin; Introns, number of introns reported; Exons, number of exons reported; mRNA bp, Messenger RNA base pairs; ORF bp, Open reading frame base pairs; AA pb, number of amino acids; Acces Nuc, NCBI accession code for nucleic acids; Acces AA, NCBI accession code for amino acids.              Kung et al. (2004); Thongda et al. (2007) Increase MIH levels, without significant changes in MIH expression Parallelly increase Vtg levels until ovary stage III. Gong et al. (2015)