The NH2-Terminal Ig Domains of Insect Projectin could serve as Elastic Elements

The connecting C-filaments of insect indirect flight muscles have been proposed as one of the elements providing muscle elasticity for the asynchronous muscle physiology of derived insects. Two large modular proteins, kettin/Sallimus and projectin make up these filaments, and for both proteins the N-terminal sequences span the extensible I-band and are proposed as the elastic segments. The C-filaments have not been studied in insects, such as dragonflies, crickets, and Lepidoptera with muscles which are largely synchronous in physiology and display different levels of muscle stiffness. In this paper we focus our efforts on the projectin protein of several insects with synchronous flight muscles; namely dragonfly, cricket, and moth. We provide evidence for the localization of projectin over the sarcomere IZ-I region that is consistent with the existence of C-filaments in synchronous flight muscles. Additionally, we determine the sequences for the NH2-terminal region of projectin in these insects and describe the presence of alternative splice variants. Using predictors of intrinsically disordered regions, we identify possible unfolded segments, especially around the short linker sequences found between the NH2 Ig domains. We propose a possible picture of projectin NH2-terminal region organized as different segments contributing elastic responses to stretch by either unfolding of highly disordered sequences (PEVK) or reorientation of domains by bending or twisting of disordered linkers between the Ig domains.


Introduction:
The classical model of muscle contraction, described as the sliding filament model includes two filament systems based on the polymers of actin and myosin proteins. Additionally, the existence of a third elastic filament system is well documented both in vertebrates (titin filament; reviewed in 1) and in invertebrates. In insects, this filament, which is known as the connecting filament (C-filament) has been described in the Indirect Flight Muscles (IFMs) of several derived insect species, including Lethocerus indicus (waterbug), Apis mellifera (honeybee) and Drosophila melanogaster (reviewed in 2). The C-filaments provide a mechanical link between the Z-discs and the ends of the thick filaments and are composed of two large modular proteins, kettin/Sallimus 3-5 (abbreviated as Sls), and projectin [6][7][8] . The C-filaments have been proposed as one of the elements generating muscle stiffness in the IFMs of derived insect, a property necessary for the stretch activation mechanism and asynchronous flight physiology (reviewed in 2,9). These C-filaments have not, however, been studied in insect such as dragonflies, crickets, and Lepidoptera which use synchronous flight muscles and display different levels of muscle stiffness 10, 11 .
The molecular characterization of projectin available for several insects reveals a conserved organization with a specific pattern of repeated motifs and unique sequences [12][13][14] . The central region of the protein is composed of Fibronectin III (FnIII) and Immunoglobulin C (Ig) domains that are organized in fourteen repeated [Fn-Fn-Ig] modules, and are both capable of multiple protein-protein interactions. The NH 2 terminus is composed of two tandem arrays of respectively eight and six Ig domains separated by a unique sequence, described as the PEVK-NTCS-1 region 13 . In D. melanogaster IFMs the NH 2 -terminus of projectin is embedded within the Z-line, thus contributing to the anchoring of the C-filaments 15 . The exact number of Ig domains within the Z-line is unknown, but the present model for C-filament organization 5 suggests that at least some of the NH 2terminal Ig domains would be outside of the Z-band and span the I-band region of the sarcomere. These domains could potentially contribute by their unfolding/reorganization to the extensibility of the projectin protein under stretch.
In D. melanogaster, the localization of projectin differs as it is found over the I-Z-I bands in asynchronous IFMs, but over the A band in synchronous muscles 16 . It is unclear whether this difference in localization is related to the asynchronous versus synchronous nature of the muscles or whether it is more generally a difference between flight and non-flight muscle organization. The localization of projectin in the flight muscles of insect orders with low passive stiffness and/or synchronous flight physiology has not been evaluated.
In this paper we describe the characterization of the NH 2 -terminal Ig domains of projectin across the span of insect evolution, including the prediction of their unfolded propensity. We demonstrate the presence of alternative splice variants producing shorter molecules, and we also provide evidence for the distribution of projectin within the sarcomere of a variety of insect flight muscles. These data are discussed in the context of a proposed model for projectin extensibility.

Insects and RNA Sample Preparation
Bombyx mori (Silk worm) and Manduca sexta (Carolina sphinx/tobacco hornworm) were purchased as larvae or cocoons from Educational Science (Tx).
Total RNA was purified from whole animals, isolated body parts (legs, heads), and from dissected thorax flight muscles using Trizol (Invitrogen™) as described before 12 .

Degenerated
Primers and cDNA Sequence Determination Multiple sequence alignments (MSA) of specific projectin regions available for several insects 12, 13 were performed using the CLUSTALW algorithm 17,18 . Degenerated primers were manually designed from nucleotides stretches showing the highest conservation. Primers used in this study are located within the two stretches of Ig domains found in the NH 2 -terminal region (Table 1).
RT-PCR reactions were performed as described (Continued on page 23) before 19 on the different RNA preparations using various combinations of the primers in Table 1. Annealing for both the RT and PCR reactions were tested in a range of temperatures to optimize each degenerated primer set. Resulting DNA fragments were isolated after agarose gel electrophoresis and subcloned into the pGEM-T shuttle vector (Promega, Inc.). They were then sequenced (Genewiz, Inc.) and the contigs assembled manually from the overlapping clones.

Bioinformatics Analysis
Sequence comparisons were carried out using the CLUSTALW algorithm 17,18 and the alignments were viewed in Jalview 20 . Pairwise homology scores generated by CLUSTALW were used to calculate the average degree of conservation for each individual Ig domain. Intrinsic disorder predictions were also carried out using three predictor software, IUPRED 21,22 , FoldIndex 23 , and PONDR-FIT 24 .

Immunofluorescence Microscopy
Dissected flight muscles or half thoraces were quick frozen in Optimal Cutting Temperature (OCT) medium at -20 o C and 10 μM slices were sectioned using a Thermo Scientific Microm HM550 cryostat. Alternatively, individual muscle fibers were cut from dissected flight muscles and directly immobilized on subbed microscope glass slides using hand pressure on a coverslip. Sections or squashes were fixed immediately and the slides processed for immunofluorescence microscopy as described previously 25 . For double labeling, the two primary antibodies from different species were applied together, as were the two animal-specific secondary antibodies. Both Cy5 and fluorescein-tagged secondary antibodies were used for identification. Factin was stained using phalloidin-FITC or RITC (Sigma) diluted 1:100 together with the secondary antibody. Slides were mounted in mounting medium (Vectorshield, Inc.). All slides were examined by epifluorescence microscopy on an Olympus 1X81 spinning disk confocal microscope. All immunofluorescence experiment preparations treated with secondary antibodies alone showed no background staining. Images were captured and analyzed using Olympus Slidebook software.

Results:
Partial Sequence Identification in Moth, Dragonfly, Cricket, and Cicada We recently completed the annotation of the full Manduca sexta (tobacco hornworm) projectin gene using sequences available from GenBank, with the analysis of its PEVK domain described elsewhere 14 . Some insect projectin sequences used in this study for comparison have been previously described 12-14 . The phylogenetic relationship and flight muscle physiology of the insects included in this study, as (Continued on page 24)

Primer
Sequence Position

Nt6d6/6R GAAGCNAAACTCTTGAAAGACGG Ig13
Cored1R GGTGGTTCACCTATAMCNMAATAYG Fnl Table 1: Sequence and position of degenerated primers used in this study. Several sequences were used to derive both forward and reverse primers, e.g. Ntd2/2R with the reverse denoted as R. The provided sequence is that of the forward strand. N = any base; S = G or C; Y = C or T; R = A or G; M = A or C (IUPAC nucleotide code).

Table 1
Freely Available Online well as the status of their projectin sequences are summarized in Figure 1 Using a collection of degenerated primers covering the NH 2 -terminal Ig domains, initial RT-PCR amplification products were generated for two dragonfly species (P. longipennis and L. pulchella), as well as one cicada (Tibicen sp.) and one cricket (A. domesticus). Sequences were completed by designing species-specific primers to fill-in any remaining sequence gaps, as well as the original degenerated primers. These analyses provide us with the corresponding complementary DNA (cDNA) sequences for the two stretches of Ig domains flanking the PEVK region, which we referred to as the N8Ig and N6Ig regions.

Sequence Analysis of the NIg Segments
The N8Ig and N6Ig sequences were split into their individual domains, and all the domains at a specific position, e.g. all Ig1, were aligned with each other using CLUSTAL-W. The pairwise homology scores generated by the multiple alignment analysis were used to generate an average homology score for each domain. As presented in Figure 2, the average degree of conservation for individual Ig domain varies, with the highest level for Ig1 and Ig2 (80%), falling to only 43% for Ig8 just before the PEVK extensible region. The six Ig domains of N6Ig and the unique FRAM linker sequence between Ig9 and Ig10 show an intermediate level of conservation from 76 to 58% ( Figure 2).
The alignments for each of the Ig domains were also compared to the Ig consensus sequence originally defined for the Ig domain of C. elegans twitchin 27 .
When only the amino acids at these consensus positions (black areas in Figure 3A) are examined, the level of conservation for residues of the consensus sequence is similar for all the Ig domains, from 17 to 21 amino acids out of 37 consensus residues. There is also a strong conservation of nonconsensus amino acids at both ends of each domain except for Ig8. Most of the variability between domains can be accounted for by a decreased conservation of residues within the central portion of the domain as exemplified by the differences between Ig1 (80%) and Ig3 (65%  For the insects labeled in red, the sequences of projectin NH 2terminus were obtained for this study. For the insect species labeled in gray, the projectin sequence is complete and the initial annotation has already been described.

Figure 2
Freely Available Online domains ( Figure 3B). This central domain is variable in both sequence and length in different titin Ig domains, but tends to protrude from the central nucleus of the domain 28-33 as shown in Figure 3B.
The analysis of the alignment for the second stretch of domains (the N6Ig) reveals a similar pattern of conservation of consensus and non-consensus residues at each ends of the domain and a more variable central domain (data not shown). The N6Ig domain between Ig 9 and Ig 10 also contains a unique sequence known as FRAM, which is 46 amino acid long and shows 76% overall homology for all the insects included in this study. The conservation of the consensus residues suggest that the projectin N-terminal Ig domains could adopt the Ig fold as described for several titin Ig domains 28, 29 . However several Ig domains of the titin  protein have also been extensively studied in relation with their unfolding behavior under stretch, which is capable of generating both secondary and tertiary elasticity (reviewed in 34). Because some of the projectin N8Ig domains may belong to the extensible region of the molecule, we also evaluated the likelihood of unfolding for these NH 2 -terminal Ig domains, by predicting the propensity of the amino acid sequences to be natively disordered. We used several predictors of intrinsically disordered domains (IDPs) and present in Figure 4 the analysis for one predictor, Foldindex 23 for the first 8 Ig domains of representative insect species. Predictions for IDP using IUPRED 21,22 and PONDR-FIT 24 were also performed and can be found in Supplemental Materials. Even though the detailed patterns are different between species and predictors, there are certain regions that are consistently predicted as disordered across all analyzed insects and all predictors. These regions localize at the borders between some of the Ig domains and typically encompass the short linker sequences present between Ig domains (Red arrows in Figure 4). These linkers are short, ranging from four to eleven amino acids and are typically well conserved between insects. Only two linkers in A. mellifera and one in L. pulchella are predicted to be folded (Green arrows in Figure 4). Figure 4 also offers a similar analysis for the first two Ig domains of titin, named Z1/Z2 and the six amino acid linker sequence found between them. Interestingly, the short linker between the two titin Ig domains is also predicted to be intrinsically disordered, whereas the Ig domains themselves have a highly ordered propensity consistent with the X-ray crystallography data 35,36 .
In D. melanogaster there are two alternative exons 19 for the linker between Ig1 and Ig2, noted as linker 1a and 1b in Figure 4 top panel. As shown in the top panel of Figure 4, the two forms behave differently as one variant (1a) is predicted to be more disordered than the other (1b, insert in top panel). This same difference in disordered behavior is also detected by IUPRED and PONDR-FIT predictors. In the other species only one Ig1-Ig2 linker has been found by annotation and corresponds in sequence to linker 1a, the most disordered one from D. melanogaster. The existence in other insects of a second linker cannot be excluded even though in the species for which genomic data are available, the intronic regions where that linker should be found were searched extensively by BLAST, as well as manually. So unless the sequence of the second linker is extremely different from the one in Drosophila, it probably does not exist in the other

species.
A similar analysis of the N6Ig region also predicts an extended region to be unfolded; it encompasses the end of Ig9, the totality of the FRAM region and approximately half of Ig10 (see Figure 2 for domain arrangement, data not shown).

Alternative Splicing in the N-terminal Ig Domains
The conservation at the amino acid level at the beginning of the protein is reinforced at the gene level where the exon-intron pattern of the first four Ig domains is identical in all insects for which genomic sequences are available except in Lepidoptera where exon #1 is split into 2 exons. On  the other hand, the exon-intron pattern of the next four Ig domains is divergent with multiple events of intron loss or acquisition ( Figure 5). When the rest of the projectin gene is considered there is overall very little conservation of the exon-intron pattern as reported before and reinforced by the highly variable number of total exons 12,13 .
This exon-intron pattern theoretically allows for two alternative splicing patterns, which both conserve the open reading frame, and lead to the precise removal of two (Ig #3 and 4; referred to as Δ3-4) or three (Ig# 2, 3 and 4) domains. The actual use of these alternate splicing patterns was tested for all insects in RT-PCR reactions using forward primers from the Ig 1 or Ig2 domains and reverse primer from the Ig5 domain. Predicted differences in the size of the amplified products allow clear identification of the "full" product (containing Ig3 and Ig4 domains) or the alternative splice variant Δ3-4. We also tested whether the alternate splice options were muscletype specific, by performing the RT-PCR reactions from RNA isolated from leg, head and flight muscles. Evidence of the alternate Δ3-4 variant was obtained for all tested insects and all muscle types, except in M. sexta where the Δ3-4 variant is absent from the flight muscles. The amplified products were sequenced, confirming that the shorter cDNA product corresponds to an in-frame removal of Ig3 and Ig4 (Δ3-4). Representative data for M. sexta and Tibicen sp. are presented in Figure 6 and the complete analysis is summarized in Table 2.
In the Manduca sexta analysis, the Δ3-4 variant is clearly amplified as a 439 bp product in lanes from head, leg and thorax RNA, but is absent from flight muscle RNA. The full product at 1,081 bp is also (Continued on page 28)      A similar analysis to test for the alternate removal of Ig2+3+4 did not reveal the existence of this alternate option in any of the species tested and any of the muscle types (data not shown). This does not preclude, however, that this variant is used in other muscles such as larval or embryonic tissues. The highly conserved FRAM domain was also shown to be alternatively spliced at least in some of the insects included in this study. In the dragonfly P. longipennis, an alternate variant without Ig9 and FRAM (referred to as ΔIg9-FRAM) is found in leg and head muscles but not in flight muscles ( Table 2). The ΔIg9-FRAM variant was also detected in all muscle types of D. melanogaster, but could not be amplified in any muscles preparation of M. sexta or A. domesticus. We could not detect it from A. mellifera, but this might be more related to the fact that in A. mellifera flight muscles, the PEVK domain is extremely short 13 (~ 100bp), and the amplified product (typically from Ig8 to Ig10) would be rendered even shorter by removing the exon containing the end of PEVK and beginning of Ig9. Data for the ΔIg9-FRAM variant are summarized in Table 2. It might be particularly relevant that the only flight muscles where the ΔIg9-FRAM has been detected are the asynchronous flight muscles of D. melanogaster.

Myofibrillar Localization of Projection
Some of the insects included in this study were also analyzed for projectin sarcomeric localization in flight muscles because in Drosophila melanogaster, projectin localization is different in synchronous muscles compared to asynchronous flight muscles. In synchronous muscles such as leg and Larva body wall, projectin co-localizes with myosin over the Aband, whereas it is associated with the I-Z-I region and the C-filaments in asynchronous flight 16 . The question is whether in synchronous flight muscles such as these of the dragonfly, cricket and moths, projectin will still be associated with the I-Z-I domain, and be a component of C-filaments or follow its synchronous A band pattern. Several projectin antibodies for which the epitope localization is known were used in order to ascertain the myofibrillar position of the protein (See Materials and Methods for list of antibodies). The localization was established in comparison with actin detected by phalloidin staining, and several other proteins, such as α-actinin and kettin using specific antibodies 7,8,26 .
In the dragonfly P. longipennis, projectin N-terminus detected with antibodies P5 ( Figure 7A) and NT2 (not shown) co-localizes with kettin at the I-Z-I band, whereas a C-terminus epitope (3b11) does not ( Figure 7B). This distribution indicates that in dragonfly flight muscles, projectin has a similar topography to the one described in more derived insects such as D. melanogaster and A. mellifera [6][7][8] . Figures 7C-E present images for M. sexta flight muscles with the colocalization of projectin epitopes and α-actinin, as well as two projectin epitopes (7E). In this latter case the muscles might have been slightly stretched during the preparation and there is a splitting of the signal (white arrows in 7E), probably representing the two projectin filaments from adjacent sarcomeres protruding on both sides of the Z-band. In Figures 7F and 7G the visualization of cricket projectin shows that the projectin epitopes are again associated with the I-Z-I band. The images also reveal a splitting of the signal (white arrows in F and G), which is wider and more commonly observed than in the M. sexta images. This may indicate a stretch of the muscles during preparation or possibly a longer sarcomere allowing for better separation of epitopes.
The pattern in all three insects is consistent with the presence of connecting C-filaments in insect flight muscles with synchronous physiology, and even in species which are considered basal phylogenetically (like dragonfly).

Discussion
Connecting (C)-filaments are elastic protein structures that are proposed to be involved in "stretch activation" of asynchronous IFMs in derived insects 2,5,6,37-40 . Because both projectin and kettin/Sls are components of the C-filaments, they contribute to the high resting tension of the IFM fibers and must contain elastic elements 4,41,42 . To perform such a function, projectin would need to be anchored at the Z band through its NH 2 -terminus and attached with (Continued on page 29) the thick filaments as illustrated in Figure 8A. The projectin protein would also most likely contain segments that can change length in response to physiological stretch. In the IFMs, projectin is found in the sarcomere I-Z-I region as part of the C-filaments, but is found by immunofluorescence microscopy over the A band in synchronous muscles 16 . The question arises whether the A band localization is associated with synchronous physiology and whether in synchronous flight muscles such as dragonfly and cricket projectin would be localized over the A band. We evaluated the localization of projectin in the flight muscles of several insects with synchronous flight physiology and established that in these flight muscles, projectin is localized over the I-Z-I domain and co-localizes with kettin and α-actinin. Therefore the position of projectin over the I-Z-I region and its inclusion into the Cfilaments are characteristics of flight muscles irrespective of their asynchronous or synchronous nature. The overall domain organization of projectin is conserved not just in insects, but at least in one other arthropod for which sequences are available, the crayfish 43 . Crayfish projectin however contains only seven Ig domains in the first NH 2 Ig stretch of the protein, whereas all insect projectin proteins characterized so far contain eight NH 2 -terminal Ig domains. The increase to eight Ig domains must have occurred very early in the insect lineage or before as it is now established by our studies in dragonfly (an order considered basal in the insect phylogenetic lineage; Figure 1) and in the silverfish (Apterygote; order Zygentoma/Thysanura; R. Southgate, unpublished observation). Of the first eight Ig domains of the projectin protein, the first seven domains are well conserved (Figures 2 and 3) indicating that these domains might need conserved surfaces to interact with other proteins and ensure the anchoring of projectin to Zband proteins, as well as to kettin in the C-filaments and/or actin. The eighth domain is much less conserved, and even though we cannot exclude   possible interactions with other proteins, its function might also be to provide additional length to the protein and hence the C-filaments, and/or serve as a transition/damper domain between the N8Ig region and the highly disordered, unfolding-prone PEVK region, which immediately follows. Predictions of intrinsically disordered regions indicate that several segments could be, at least partially, unfolded, in particular the linker regions between the Ig domains. Recent studies of titin Z1-Z2 Ig domains have indicated that the orientation of these two domains in respect with each other could change under stretch and that mechanical elasticity could result from bending and twisting of the linker region 34-36 . In our analysis the six amino acid linker between Z1 and Z2 is predicted as disordered. It is tempting to speculate that the disordered linkers found between the different Ig domains of NH 2terminal projectin could play a similar role and provide the so-called "tertiary structure" elasticity 34, 44 . The relative orientation of adjacent domains would be controlled by the bending and twisting of the respective linker, as well as possible Ig domain interactions, which is where Ig sequence conservation could also play a role. This would provide the possibility of molecule extension under relative low force as illustrated in the model proposed in Figure 8B. The extent of this elongation could also be partially modulated by the existence of the Δ3-4 alternate splice variant, which is shorter by two Ig domains and two linkers (see Figure 8B and 8C). This idea of combining intrinsic disorder and alternative splicing to increase protein plasticity has already been tested for a large number of proteins from the SwissProt database showing that protein segments undergoing alternative splicing are more often also intrinsically disordered, allowing a range of functional diversity without conformational constraints 45 . The N8Ig region is followed by a PEVK segment, which has been postulated to have the ability to reversibly unfold under low forces and its length is also controlled by extensive alternative splicing events 12-14 ( Figure 8B and 8C). The PEVK is followed by a second stretch of six Ig domains (N6Ig), with a unique segment known as FRAM between Ig9 and 10. The FRAM sequence is a highly conserved sequence, but is also predicted to be highly disordered together with the beginning of the following Ig domain. This FRAM-Ig10 disordered region could provide an additional extensible segment with elasticity at low forces. The presence of an alternative splice variant removing the Ig9-FRAM region could again provide flexibility for the range of extension achieved by the molecule.

Conclusion:
A picture is starting to emerge of the possible organization of projectin NH 2 terminal region with the length of the molecule and therefore its elastic contribution controlled both by alternative splicing events creating shorter variants ( Figure 8C) and by unfolding of highly disordered sequences (PEVK) or reorientation of Ig domains through bending or twisting of disordered linkers in the N8Ig and N6Ig regions following stretch ( Figure 8B).

Acknowledgement:
This research was supported by the College of Charleston faculty (to AAS) and undergraduate research (to MC) grants. We want to thank Dr. J.
Marden for providing the L. pulchella samples.