Academic Editor:wentao xu, food safety and molecular biology
Checked for plagiarism: Yes
Review by: Single-blind
Investigations of Molecular Evolutionary Mechanisms in Partially Sequenced Heat Shock Protein70 Homologue-Coding Gene of Olive Leaf Yellowing-Associated Virus Isolates from Tunisia
Reverse Transcription Polymerase Chain Reaction (RT-PCR) using new designed primers pair for Heat Shock Protein70 homologue (HSP70h) of Olive leaf yellowing-associated virus revealed 667 amplified product of 10 olive accessions collected from various olive-growing regions in Tunisia. Amplicons were cloned and sequenced. The sequences were deposited in the international databases. Pairwise sequence comparisons among 10 Tunisian isolates along with a reference sequence (AJ440010) extracted from GenBank revealed a nucleotide identity of 86.06-99.40 and an amino acid similarity of 91.89-99.55. Sequence multiple alignments were searched for evidence of recombination using three methods, ie. Differences of Sums of Squares (DSS) implemented in TOPALi v2.5 software and Single Breakpoint (SBP) along with GARD, a genetic algorithm, both incorporated in HyPhy package. All used methods pointed out the presence of putative breaking points in partially sequenced HSP70h-coding gene. Since failing to account for recombination can mislead the phylogeny inference and can elevate the false positive error rate in positive selection assessment, the use of GARD resulted in the reconstruction of different phylogenies on the left as well as on the right sides of putative recombination breaking points, and the 11 accessions were distributed into at least three clusters compared to MEGA6 software which delineated only two clades. Nonetheless, by dividing the aligned sequences at breakpoints into separate sequence sets, MEGA6 delineated a clustering pattern different from the former two. As a result, recombination reshuffled the affiliation of the different accessions to the clusters. Analysis of selection pressures exerted on HSP70h encoded protein using different models (SLAC, IFEL, FEL, REL, PARRIS, FUBAR, MEME, GA Branch, and PRIME) taking into account recombination, and implemented in HyPhy package, revealed that it underwent predominantly purifying selection as confirmed by Tajima’s D, Fu and Li’s D and F tests, and SNAP algorithm. However, a few sites were also under positive selection as assessed by various models such as FEL, IFEL, REL, MEME, and PRIME.
Viruses represent an important threat to plant production worldwide. In spite of numerous efforts for their elimination, the number of these viruses is still noticeably limited. Their eradication is being counterbalanced by emergence of new viruses which induce severe economic losses. These emerging plant viruses highlight the importance of studies of plant virus evolution. Much of adaptive potential of viruses stems from their large population sizes, and their high degrees of variability. The use of error-prone RNA polymerase during replication by RNA viruses is the primary driver of high mutation rates. Arising mutations can become fixed within virus populations, and are subject to the effects of processes such as natural selection. Selection is a directional process by which the frequencies of variants that are the fittest in a given environment will increase in the population (positive selection) whereas those of less fit variants will decrease (negative selection). At the molecular level, selection detection relies on comparisons of the relative numbers of non-synonymous and synonymous substitutions. Silent substitutions, resulting in no change in the amino acid encoded by the codon, are assumed to arise via neutral mutations and are not under selection 1. However, replaced substitutions that lead to resultant changes in amino acids can often affect tertiary structure and function, and are typically either under positive (adaptive) or negative (purifying) selection. Non-synonymous substitutions will be mildly deleterious and will typically be removed from populations under negative selection unless they directly impose some fitness advantage 1. The ratio of dN/dS(rate of non-synonymous substitutions per non-synonymous site/ rate of synonymous substitutions per synonymous site) can suggest whether a gene has had a variable rate of non-synonymous changes than would be expected from random neutrality and is potentially undergoing positive or negative selection for amino acid changes in that region. Thus, the value of dN/dS more than 1 suggests that the gene is under positive selection. A value close to 1 suggests that a gene is under neutral selection and is experienced neutral evolution. However, a value of less than 1 indicates that a gene is under the influence of negative or purifying selection. Recombination is another driving force of plant RNA virus evolution. In addition to increasing sequence variability, RNA recombination can be an efficient tool for viruses to repair viral genome, thus contributing to viral fitness 2, 3, 4, 5, 6, 7. It may also play a role in the formation of subviral RNAs that include defecting interfering (DI) RNAs associated with many plant viruses, i.e. some members of the Tombusviridae8. DI-RNAs are mainly derived from the parent (helper) virus via sequence deletion. The ease of their genetic manipulation has resulted in rapid discoveries on cis-acting RNA replication elements required for replication and recombination 4. Additionally, these DI-RNAs played a major role in post-transcriptional gene silencing (PTGS). They could trigger potent gene silencing response against the helper virus without hurting themselves from the same response 9.
Olive is one of the most widely grown fruit tree in Tunisia. It plays a major social, economical and cultural role. Olive trees cover an estimated area of 1.7 million hectares out of which 20,000 are irrigated. El Air et al. 10 stated that eight viruses were detected from Tunisian olive groves, i.e. Arabis mosaic virus (ArMV), Strawberry latent ringspot virus (SLRSV), Cherry leaf roll virus (CLRV), Cucumber mosaic virus (CMV), Olive latent virus 1, Olive latent virus 2, Olive latent ringspot virus (OLRSV), and Olive leaf yellowing-associated virus (OLYaV). The latter has been ascertained in olive trees from 18 different countries 11. In addition, according to this author, OLYaV is a member of the so-called «leaf yellowing complex» on olive. It appears to have a detrimental impact on the yield and growth rate 12. However, OLYaV has been also reported that it was found in symptomless trees in numerous countries 11. OLYaV is currently a member of the family Closteroviridae 13; but it is not allocated to any of the genera composing this family because more biological and molecular data are likely to be needed for unequivocal classification. This virus has a monopartite positive-sense single-stranded RNA. Only part of the viral genome, comprising 4,605 nucleotides from ORFs 1b (RdRp), 2(21kDa), 3(7kDa), 4(HSP70h), and the 5’ end of ORF 5 (HSP90h) has been sequenced 13, 14. These sequences have been deposited in GenBank under the accession number AJ440010. Later, genome sequencing has been extended towards the 3’ terminus of ORF 5 (HSP90h) giving rise to a segment having a size of 854 nucleotides 15. This RNA is protected by a coat protein having a molecular weight of 24 kDa. No seed and vector transmission are recorded so far 11.
In spite of widely provided efforts to characterize OLYaV at molecular level, studies about molecular evolution of this virus are nowadays still lacking. The objective of this work was to give a preliminary idea on evolutionary strategies employed by this virus to survive by searching for the occurrence of potential recombination events and evaluate selection pressure exerted on amino acids even though using a limited genomic region of OLYaV (partial sequences of HSP70h-coding gene).
Shoot samples from 37 cultivars showing yellowing symptoms on leaves were collected from two mother block stands Institut de l’Olivier of Sfax (southern Tunisia) and Gatrania (central Tunisia), and from a nursery Nour located at Karma (central Tunisia).
Newly designed specific primers by using Primer3 software 16, were used for molecular studies of HSP70h-coding gene, and having the following sequences: sense primer: 5’-ATC ATG AAC GAG CCT TCA GC-3’ ; antisense primer : 5’-CGG CAG CGA CTA TAA TAC GA-3’. These primers should be amplifying a DNA copy of 667 base pairs. The virus sense and antisense primers correspond to nucleotides (nt) positions 2618-2637, and 3265-3284, respectively, of the sequence submitted to GenBank (AJ440010) by Saponari et al. 14.
Phloem tissue, scraped from 12 shoots (two-year old) per tree collected in spring of 2013, was powdered in liquid nitrogen. Then, 100 mg of each sample was used for total RNA extraction using PureLink RNA™ Mini Kit (Life technologies, Carlsbad, CA-USA) according to the manufacturer’s protocol. RNA was finally eluted with 50 µl of RNase/DNase-free distilled water.
Purified total RNA (5µl) were mixed with 1 µl random hexamer primer (Bioron GmbH, Ludwigshafen, Germany) (0.5 µg.µl-1), 1 µl 10 mM dNTP mix (10 mM each dATP, dCTP, dGTP, dTTP at neutral pH), and 13 µl RNase/DNase-free distilled water. The reaction mixture was incubated at 65°C for 5 min and chilled on ice for 5 min. After a brief centrifugation, a second reaction mixture containing 6 µl 5X First-Strand Buffer [250 mM Tris-HCl (8.3 at room temperature), 375 mM KCl, 15 mM MgCl2], 3 µl 0.1M DTT, and 1 µl RNaseOUT™ (40 U.µ-1) (Life technologies, Carlsbad, CA-USA), was added to the former. The reaction mixture was incubated at 37°C for 2 min and chilled on ice for 5 min. A third mixture containing 1 µl M-MLV (200 U.µl-1) (Life technologies, Carlsbad, CA-USA), 3 µl 10 mM dNTP mix, 4 µl 5X First-Strand Buffer, and 12 µl RNase/DNase-free distilled water, was added to the former two. The reaction mixture was incubated successively at 25°C for 10 min, 37°C for 50 min, and 70°C for 15 min. Incubation was done in a thermal cycler (Multigene optimax, Labnet, Edison, USA).
The PCR reaction mixture using the newly designed specific primer was prepared by using 5 µl of the resulting cDNA. The OLYaV cDNA was transferred to a tube containing 5 µl 10X PCR buffer minus Mg++ 200 mM Tris, 2 µl 10 mM dNTPs, 3 µl 50 mM MgCl2, 15 pmol of each forward and reverse primers, 2 U Taq DNA polymerase (5 U.µ-1) (Life technologies, Carlsbad, CA-USA). Finally, RNase/DNase-free water was added to 45 µl. The amplification proceeded in the thermocycler (Multigene optimax, Labnet, Edison, USA) at 94°C for 5 min, and through 35 cycles of 94°C for 30 s, 53°C for 30 s, and 72°C for 45 s, with a final step at 72°C for 10 min. Amplification products were analyzed by electrophoresis of 10 ml aliquots on 1.5% agarose gel, in 1X Tris-Borate-EDTA buffer 17. Bands were visualized by ethidium bromide staining (5 mg.ml-1) and photographed using a U.V. transilluminator (ETX 20.M) at a wavelength of 312 nm and a Vilber Lourmat photo-print system (Model DP-001.FDC).
Amplicons of successfully amplified isolates were cloned into pCR2.1 vector using a TOPO TA cloning kit (Life technologies, Carlsbad, CA-USA) and used to transform DH5α cells of Escherichia coli following manufacturer’s instructions. Recombinant clones were screened for the presence of inserts of the expected size by colony PCR using M13F and M13R primers. Plasmid DNA was purified from positive recombinant clones using the Wizard minipreps DNA purification system (Promega corporation MD). Three clones for each positive isolate were sequenced in both orientations based on dideoxy chain termination method 18 using the Big Dye Terminator Ready Reaction mix provided by Life technologies (Carlsbad, CA-USA) in an automated sequencer (ABI PRISM 377). When necessary, additional clones were sequenced to resolve ambiguities in amplicon sequences. Contigs were assembled using CAP3 program 19. Sequence analysis was performed using BioEdit program.
The nucleotide sequences of PCR products along with the reference sequence AJ440010 were aligned using CLUSTALW 2.1, CLUSTALX 2.1 23 and Multalin 24 softwares with default settings. The phylogenetic relationships among OLYaV isolates were determined with the Maximum Likelihood (ML) algorithm incorporated in the MEGA version 6 program 25. Based on the evaluation of best fit substitution model executed in MEGA6, the ML tree was reconstructed under the assumption of substitution model T92 coupled to a discrete Gamma distribution (+G) with five rate categories 26. The substitution model parameters estimated were (i) base frequencies: f(A) = f(T) = f(C) = f(G) = 0.218, (ii) substitution rates: r(AT)= r(CA) = r(GT) = r(GT) = r(TA) = 0.017; r(AC) = r(TG) = r(CG) = r(GC) = 0.013; r(AG) = r(TC) = 0.191; r(CT) = r(GA) = 0.247, and (iii) transition/transversion ratios: R = 6.98. The Bayesian Information Criterion value (BIC = 4458.524) with T92+G (+G = 0.33) model was the lowest among the 24 models tested.
Occurrence of potential recombination events between nucleotide sequences was explored with SBP (Single breakpoint), GARD 27, 28 and TOPALi v.2.5 29. SBP and GARD are two algorithms for recombination detection use a statistical approach to search recombination breakpoints from multiple-sequence alignments of homologous sequences. Potential breakpoints are identified by improvement of the small-sample corrected Akaike information criterion (cAIC) 30 for phylogenetic trees constructed of individual recombinant fragments. Based on the outcome of the analysis, a level of support is assigned and expressed as a breakpoint placements score 27, 28. Breakpoints identified by GARD were then assessed for significance using the KH test 31 of the HyPhy package 32. TOPALi v.2.5 implements DSS (Differences of Sums of Squares) statistics. This method uses a sliding window and considers changes in the branching patterns of the trees estimated on the windows along the alignment, corresponding to high values of DSS.
DnaSPversion 5.10.01 33 was used to estimate Tajima’s D34 and Fu and Li’s D and F35 statistical tests to examine the hypothesis of neutrality operating on the OLYaV (partial HSP70h gene) sequences. An estimation of several population genetic parameters including nucleotide polymorphism (П estimated by the average number of nucleotide differences between two random sequences in a population), haplotype diversity (Hd, the frequency and number of haplotypes in a population), the statistic θfrom the number of segregationsites (S) 36, the average rate of synonymous and non-synonymous substitutions, ΔHd (the variance of haplotype diversity), K (average of number of pairwise nucleotide differences), was done. The distribution ofdS and dN along the coding regions was analyzed using the SNAP program (http://www.hiv.lanl.gov; 37). Based on the results obtained by the statistical tests mentioned above, examination for selection was performed using codon-based Maximum Likelihood methods i.e., the Single-Likelihood Ancestor Counting (SLAC), Fixed Effects Likelihood (FEL), Internal Fixed Effects Likelihood (IFEL), Random Effects Likelihood (REL) 38, Mixed Effects Model of Episodic Selection (MEME) 39, and Fast Unbiased Bayesian Approximation (FUBAR) models 40 and the Partitioning Approach for Robust Inference of Selection (PARRIS) 41 implemented at http://www.datamonkey.org, the web server of HyPhy package 42. To further investigate when and how selection pressure varied over the evolutionary history, GA-Branch (Genetic Algorithm-Branch) method 43, was applied. The GA-Branch program utilizes a genetic algorithm to test an extensive number of models of codon evolution based on small sample AIC score. This analysis is able to classify each branch to a specific dN/dS rate class. Furthermore, PRIME (Property Informed Model of Evolution) determined which biochemical properties could drive substitutions at a given site; e.g. if a site is positively selected, then which properties are being selected for/against. The exchangeability function is a product of property-specific contributions. PRIME is a model which involves a parameter α which represents the importance of property. Positive value (p<0.05) of α cause the property to be conserved (purifying selection) whereas a negative value (p<0.05) means that the property tends to changing (positive selection). In case of α = 0, selection is neutral with respect to that property. PRIME currently supports two predefined sets of five amino acid properties: the properties used by Conant et al. 44 (α1, chemical composition; α2, polarity; α3, volume; α4, iso-electric point; α5, hydropathy), and Atchley et al. 45 (α1, polarity index; α2, secondary structure factor; α3, volume; α4, refractivity/heat capacity; α5, charge/iso-electric point).
RT-PCR successfully amplified the targeted genome portion of 10 out of the 37 collected accessions. The size amplicon obtained (Figure 1) was as expected, i.e. 667 pb, and as shown in the revealed sequences (Figure 2). The sequences of 10 accessions were deposited in the international databases under the accession numbers: KP143750 (Meski G.TN), KP143751 (Frenjiventu G.TN), KP143752 (Chetoui G.TN), KP143753 (Ascolana G.TN), KP143754 (Chetoui N.TN), KP143755 (Sahli N.TN), KP143756 (Meski N.TN), KP143757 (Meski I.O.S.TN), KP143758 (Zarrazi I.O.S.TN), and KP143759 (Chemlali I.O.S.TN).
Figure 1. Agarose gel of HSP70h PCR products. M Marker (100 bp ladder), lane 0 negative control, lane 1 positive control, lane 2 KP243751, lane 3 KP243759, lane 4 KP243752, lane 5 KP243754. Bands correspond to amplicons having a size of 667 bp.
Figure 2. Nucleotide sequence alignment of HSP70h partial gene of 10 accessions of Olive leaf yellowing-associated virus collected in Tunisia along with the reference sequence AJ440010 performed by using MultAlin program. Dots indicate identical residue.
The sequences of the amplicons obtained from the isolate genome candidates as well as the sequence reference AJ440010 were aligned and showed a divergence of sequences ranging roughly from 86.06 to 99.40, and from 91.89 to 99.55 for the nucleotide and amino acid residues, respectively (Table 1). Nucleotide substitution patterns and rates of partial HSP70h sequences were estimated using the model T92. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories +G, parameter = 0.3744). Thus, the general model formula was T92+G. Rates of different transitional and tranversional substitutions were determined and shown in Table 2. The estimated Transition/Transversion bias (R) was 7.14. A total of 667 positions were identified in the final dataset. Codon positions included were first + second + third + non-coding. All positions containing gaps and missing data were eliminated from the dataset (complete deletion option) before analysis with the MEGA6 program.Table 1. Nucleotide sequence identity (lower diagonal) and amino acid sequence similarity (upper diagonal) of the HSP70h partial gene of 11 isolates of OLYaV. Values in bold indicate lowest and highest percentages of divergence.
To detect potential recombination breakpoint in aligned sequences of 11 accessions, three methods were used: DSS (Differences of Sums of Squares) implementend in TOPALi v2.5, Single-Breakpoint (SBP) and the genetic algorithm GARD incorporated in HyPhy package. DSS statistics (window size: 500, step size: 10) revealed a major peak strongly supporting the presence of recombination signal in aligned sequences having the position 380 (Figure 3). Similarly, SBP algorithm revealed a breakpoint in the position 369 supported by a corrected Akaike Information Criterion (cAIC) having the value 29.9312, and a model averaged support of 100%. In contrast, GARD placed breakpoints at bp 99, 375, and 585 in aligned sequences of 667 bp segment based on cAIC goodness of fit. These breaking points corresponded to positions 2716, 2992, and 3202 in the entire HSP70h-coding gene of the reference sequence AJ440010, respectively.
On phylogeny, using MEGA6 which did not take into account the presence of recombination signals when reconstructing trees, inferred isolates split into two major groups. While the former encompassed the accessions KP143750, KP143751, KP143752, KP143753, KP143754, KP143755, KP143756, and AJ440010, the latter comprised the accessions KP143757, KP143758, and KP173759 (Figure 4a). Since several putative recombination events were detected, a single phylogeny may no longer accurately describe evolution of OLYaV isolates as shown in Figure 4a, where only two distinct majorgroups were delineated. Different phylogenies may therefore be required to describe the evolutionary relationships of the segments defined by recombination breakpoints. Thus, aligned sequences were divided at breakpoints into separate sequence sets and different trees were reconstructed using MEGA6 software. It was shown that the clustering pattern was heterogeneous. In fact, whereas the reconstructed trees of the segments 1-99 bp, and 586-667 bp segregated into five (Figure 4b), and four (Figure 4c) clusters, those of the segments 100-375 bp (Figure 4d), and 376-585 bp (Figure 4e) were identical to the tree reconstructed without taking into account recombination (Figure 4a) and thereby representing 72% of the whole partially sequenced HSP70h gene. In HyPhy, application of the test of Kishino-Hazegawa (KH) resulted in the identification of a non significant topological incongruence at p = 0.1 around breaking point 99 (Figure 5a, Figure 5b), thus suggesting a priori that other processes (e.g. substitution rate heterogeneity in the HSP70h gene) may be contributing to phylogenetic variation before and after the breakpoint (Table 3). In contrast, application of the test of Khishino-Hazegawa (KH) resulted in the iden0tification of a significant topological incongruence at p = 0.1 around breaking point 375 and at p = 0.05 around breakpoint 585 bp in aligned sequences of HSP70h gene fragment (Table 3) as clearly evidenced by GARD plots (Figure 6). GARD reconstructed trees were discordant based on the gene sequence on the right and left sides of the identified breakpoints (Figure 5b,Figure 5c,Figure 5d). Thus, GARD evidenced that the 11 isolates of OLYaV were distributed into at least three distinct groups but, clearly, group content varied according to gene sequence fragment. For example in Figure 5b, KP143752 accession constituted a distinct cluster; in contrast, this accession is no more constituting a distinct clade as shown in Figure 5c and thereby was included in group I composed by the accessions KP143751, KP143752, KP143754, KP143755, KP143756, KP143757, KP143758, KP143759, and AJ440010; thus suggesting a rearrangement operated under the influence of recombination events.
Figure 4. Radial representation of the phylogenetic relationship among 11 isolates of OLYaV based on the nucleotide sequence of the entire segment of HSP70h partial gene (a) and of divided segments at breakpoints, i.e. segments 1-99 bp
Figurre 4e. According to best fit Maximum Likelihood model, the trees were reconstructed using MEGA6 software incorporating the ML algorithm under assumption of the Models T92+G, K2, K2, K2+G, and K2, respectively. Bootstrap analysis was performed with 1,000 replicates. The numbers above the branches indicate the bootstrap confidence value. The scale bar shows the number of substitution per nucleotide.
Figure 5. GARD tree reconstruction of the HSP70h partial gene segments spanning 1-99 bp (Fig. 5a), 100-375 bp (Fig. 5b), 376-585 bp (Fig. 5c), and 586-666 bp (Fig. 5d), respectively. In each tree, three even four clusters with different topologies were delineated. Scale bar indicates the number of substitutions per nucleotide.
|Gene||cAIC||Δ cAIC||Breakpoint location||LHSp-value||RHSp-value||Significance|
To examine whether the number of segregating sites in the sequences departs from the neutral expectation, the software DnaSP version 5.10.01 was used. It allowed the calculation of Tajima’s D as well as Fu and Li’s D and F statistical tests to assess the neutrality and influence of demographic forces on the population which were as the following: Tajima’s D = -0.23234 (not significant at p > 0.1), Fu and Li’s D = -0.08937 and Fu and Li’s F = -0.14339 (not significant at p > 0.1) (Table 4). The calculation was based on the total number of mutations. The significantly negative values of Tajima’s D, and Fu and Li’s D and F statistical tests for HSP70h partial sequences discounted the neutral hypothesis suggesting the occurrence of purifying selection and demographic expansion of OLYaV population. Furthermore, the selection profiles of HSP70h partial sequences were determined by submitting the sequence alignments to SNAP program where averages of all pairwise comparisons led to the conclusion that a purifying selection (dN<dS) occurred. In fact the terms dN and dS were as following: dN = 0.0238; dS = 0.3978, dN/dS = 0.0598 (Table 4). Afterwards, in the HyPhy package, available at the datamonkey server which implements various models of evolution, investigations site-by-site of the signature of selective pressure based on dN/dS ratio by applying the SLAC, FEL, IFEL, REL and FUBAR methods which incorporate non-synonymous as well as synonymous rate variation among codon sites explicitly, were conducted. While SLAC (0.1 significance level) (Table 5a) and FUBAR (posterior ptobability p ≥ 0.9) (Table 5b) detected only negatively selected sites, i.e. 23, and 90 codons, respectively, IFEL (0.1 significance level) (Table 5c), FEL (0.1 significance level) (Table 5d), and REL (p=0.02) (Table 5e) detected at the same time sites under positive selection, i.e. 1, 1, and 4 adaptively selected sites, respectively, and sites under purifying selection, i.e., 31, 72, and 1 negatively selected sites, respectively. Since the majority of the codons were under purifying selection, MEME model found three signatures of episodic diversifying selection (at the 0.1 significance level) (Table 5f). Use of the PARRIS method resulted in detection of negative selection at p < 0.1 (Table 5g) in aligned sequences of HSP70h-partial gene as given by inferred distribution rates for the null (M1) and alternative models (M2) mentioned in Table 5g. It is noteworthy that codon 82 was found to be under positive selection by both methods MEME and IFEL. Whereas, codon 220 was detected as adaptively selected site by three methods: FEL, REL, and MEME (Table 5d,Table 5e and Table 5f). To gain further insight into the lineage specific nature of the selective pressures acting on each branch of the phylogenetic tree, analyses using a genetic algorithm, namely GA-Branch, were performed. GA-Branch selected three classes with the support of 2512 models at 95% confidence set and a cAIC having the value 4046.03 (∆cAIC = 0.418914). A total of 19 branches were enumerated (Figure 7) and segregated into three parts. Whereas the first class was constituted by 10 branches, the second class was formed by four branches, and the third class comprised five branches whose dN/dS ratios were 0.0725866, 0, and 0.240833, respectively. Thus, all 19 branches were under negative selection. In corroborating the results found above, PRIME, an additional algorithm recently incorporated in HyPhy package was used in searching for which physico-chemical property of amino acids able to drive substitution according to two models: Conant-Stadler and Atchley. Based on Atchley model, it was demonstrated that two detected codons, i.e. 4, and 82, had each of them one conserved property and one changing property. While codon 4 had a changing property [the volume (α3)] with the value -3.154 (property importance) at p = 0.032, and a conserved property (charge/iso-electric point (α5) with the value 7.343 at p = 0.040, codon 82 had also on the one hand a changing property [polarity index (α1)] with the value -3.808 (property importance) at p = 0.028, and on the other hand a conserved property (refractivity/heat capacity (α4) with the value 4.363 (property importance) at p = 0.030. In other words, these two codons were both under purifying selection as well as adaptive selection. Consequently, exchangeabilities in HSP70h-partial gene seemed to be influenced by physico-chemical properties of amino acids, but only four types out of 10 properties were able to contribute to the evolution of the sequences. It is worth noting that no results were obtained with Conant-Stadler model.Table 4. Population genetic parameters and neutrality tests calculated for the HSP70h partial gene. M = number of sequences , S = number of segregating sites, θ = the statistic θ from the number of segregation sites (S) (Watterson θ estimator) and the average of synonymous and non-synonymous substitutions, Π = nucleotide diversity (estimated by the average number of nucleotide differences between two random sequences in a population), Hd = haplotype diversity, ΔHd = the variance of haplotype diversity, K = Average of number of pairwise nucleotide differences , +Tajima’s D and Fu and Li’s D and F tests measure the departure from neutrality for all mutation in HSP70h partial gene. *Average of all pairwise comparisons.(http://www.hiv.lanl.gov/cgi-bin/SNAP/WEBSNAP/SNAP.cgi)
|Population statistics||Test of neutrality +||Synonymous and non–synonymous statistics after SNAP algorithm *|
|M||S||θ||Π||Hd||ΔHd||K||Tajima’s statistics||Fu and Li’s F statistics||Fu and Li’s D statistics||dN||d S||dN/d S|
|FUBAR Model||Codon||α||β||β- α||Posterior probability β< α|
|Positively selected site||82||0||6.06023||Infinite||15.1217||0.0519717|
|REL Model||Codon||E(dS)||E (dN)||Normalized E(dN-dS)||Posterior Probability||Bayes Factor|
|Negatively selected site||75||2.0235||0.0224856||-2.00102||0.998862||100.564|
|MEME||Codon||α||β -||Pr[β=β - ]||β +||Pr[β=β + ]||p-value||q-value|
|PARRIS||Synonymous rate||ω (dN/dS) ratio|
|Coding region||Inferred Rate distributions||Rate Class||1||2||3||Rate class||1||2||3|
|HSP70||Null Model (M1)||ds||0.59||1.65||2.28||ω||0.01||1.00||-|
|Alternative Model (M2)||ds||0.59||1.64||1.68||ω||0.01||1.00||4.84|
Figure 7. Lineage specific analysis of selective pressure in HSP70h partial gene of OLYaV. A cladogram is shown with Maximum-Likelihood Estimates of lineage-specific dN/dS during HSP70h partial gene evolution. Percentages for branch classes in the legend reflect the proportion of total tree length (measured in expected substitutions per site per unit time) evolving under the corresponding value of dN/dS.
Populations of plant viruses are genetically heterogeneous, and the frequency distribution of genetically varied entities in the population may fluctuate with time. This process is known as evolution. The variability of virus population led to the notion of quasispecies which assumes high mutation rates for RNA viruses. Since the presence of high mutation rates of RNA viruses and high accumulation levels in plant cells, RNA viruses have large and highly diverse populations. Consequently, viral populations would easily respond to changing selection pressure, and the evolution of high mutation rates would have an adaptive behaviour, allowing the virus to survive in changing environments. Since not accounting for other evolution forces such as recombination which can mislead not only phylogenetic analyses, but can increase the false positive error rate in positive selection inference. In this study, the author sought after recombination in the sequence of a portion (667 bp) of HSP70h-coding gene of OLYaV of 10 Tunisian accessions along with a reference sequence extracted from GenBank (Accession number AJ440010). In fact, three methods were used: DSS implemented in TOPALi v2.5 program, SBP, and GARD, both incorporated in HyPhy package. While the two former detected only a single breaking point, i.e. positions 380, and 369, respectively, the latter attempted to find all recombination signals in the genomic segment 667 bp. The breakpoints that were found in GARD analysis corresponded to hot spots particularly in positions 99, 375, and 585 bp. GARD, a powerful tool screened multiple-sequence alignments for recombination, determined all possible locations of breakpoints, inferred phylogenies for each putative non recombinant fragment and assessed goodness of fit by an information-based criterion such as small sample Akaike Information Criterion (AIC) 46. AICc derived from a maximum likelihood model fit to each segment 47. Thus, since putative recombination events occurred, inferred phylogeny produced by MEGA6 software does no longer reflect the real evolutionary process of the analyzed sequences (Figure 4a). Therefore, different tree topologies were necessary. Using MEGA6 software, only two separated sequence sets, i.e., segments 1-99 bp and 586-667 bp gave rise to two different tree topologies (Figure 4b, Figure 4c) compared to the segments 100-375 bp (Figure 4d), and 376-585bp (Figure 4e) which were identical to reconstructed tree without taking into account recombination (Figure 4a). According to statistical data, GARD reconstructed trees on both the left and the right sides of breaking points resulted in no significant incongruence topology at p = 0.1 around breakpoint 99. In contrast, topologies were incongruent at significance levels p = 0.1 around breakpoint 375, and p = 0.05 around breakpoint 585. Consequently, the affiliation of the isolates to different clusters was reshuffled. In addition to differences based on genome composition, viruses are expected to face widely different selection pressures depending on the taxon of the organisms that they infect. Comparison of synonymous and non-synonymous substitution rates provides an important means for studying the mechanisms of sequence evolution. In order to avoid false positive rates in selective pressures, different models (SLAC, FEL, IFEL, REL, FUBAR, PARRIS) incorporated in the HyPhy package which all take recombination into account, were used. These analyzes permitted to show that the majority of codons were under negative selection. This result was congruent with the results given by SNAP program as well as by Tajima’s D and Fu and Li’s D and F tests indicating a deviation from the null hypothesis and a demographic expansion. However, a few codons were positively selected, as pointed out by the MEME model which evidenced episodic diversifying selection at individual sites. Similarly, IFEL, FEL, and REL models detected 1, 1, and 4 positively selected sites, respectively. To characterize further the evolution of HSP70h-partial gene and detect possible differences of selective pressure between different branches among the gene phylogeny, a more refined analysis using GA-Branch model allowing for dN/dS ratio to vary between branches, was performed. The results provided support for all phylogenetic branches that were under negative selection (Figure 7). PRIME program, however, indicated that purifying and adaptive selection signatures prevailed as well.
Nowadays, 15 different viruses infecting olive with diverse taxonomic allocation are described 11. Substantial efforts and considerable attention were paid by several workers particularly from Mediterranean and Middle Eastern countries to characterize molecularly different viruses infecting olive. Unfortunately, knowledge about genetic factors driving their evolution is still scarce. For example, Cardoso et al. 48 reported that Olivemild mosaic virus is a recombinant between Olive latent virus 1 (OLV-1) and Tobacco necrosis virus strain D (TNV-D). Varanda et al. 49 attempted to study selective constraints acting on OLV-1 CP gene using REL model. However, to date, and according to author’s best knowledge, this study described here is first report on molecular evolution of OLYaV based on the analysis of partially sequenced HSP70h-coding gene of 10 accessions collected from Tunisia along with the reference sequence AJ440010 downloaded from GenBank which is, by the way, the only one available in the databank which can provide a fragment as long as 667 bp. Such segment belongs to the larger genomic fragment of OLYaV sequenced so far representing no more than 25% of the genome of related viruses having a size between 15 and 20 Kb. Moreover, except the sequences of HSP70h gene (667 bp) of OLYaV described in this study, all those deposited to date in GenBank have a size ranging from 361 bp to 611 bp. It is worth noting that expected further studies on molecular evolution of viruses infecting olive should be undertaken.
The author is grateful to Ministry of Higher Education and IRESA (Institution de la Recherche et de l’Enseignement Supérieur Agricoles) of Ministry of Agriculture in Tunisia for providing funds to carry out this work.