Academic Editor:Leonid Tarassishin, Associate, Department of Pathology, Albert Einstein College of Medicine, NY
Checked for plagiarism: Yes
Review by: Single-blind
Discovery and Quantification in Mass Spectrometry-Based Proteomics
Mass spectrometry (MS) has been successfully used to analyze biological samples and advances of MS-based approaches have turn MS data from largely qualitative to quantitative. These MS-based quantitative approaches using label-free, tags, or stable isotope labeling have their own strengths and limitations. The variability introduced by different methods prior to quantitative mass spectrometry should be considered, and accuracy and precision of MS measurements can also vary depending on the strategy used for MS quantification. Therefore, the development of methods for accurate protein quantitation is one of the most challenging areas of proteomics. Using these quantitative approaches, one can investigate the dynamics of proteome through differential protein expression in normal biological processes and diseases.
Dr. Chen received her Ph.D. degree from the Department of Molecular Pathology at the University of California, San Diego in 2002. Then, she pursued her postdoctoral training of mass spectrometry-based proteomics technology at the Scripps Research Institute in Dr. John Yate’s laboratory. In 2008, she joined SUNY Stony Brook University as the scientific director of the Proteomics Center at the School of Medicine. Currently, she is the director of the Proteomics Shared Resource at the Herbert Irving Comprehensive Cancer Center and a faculty member in the Department of Pharmacology at the Columbia University Medical Center. She has been developing and conducting multidisciplinary research projects to bridge the knowledge between biology and mass spectrometry. Using the cutting-edge mass spectrometry-based proteomic techniques, she has been focused on elucidating mechanisms of tissue-specific breast cancer metastasis and identifying protein targets to eradicate the growth of distal breast cancer metastasis. She has also implemented new proteomics techniques to perform quantitative analysis on complex biological samples such as tissues from transgenic disease models and human tissues for biomarker discovery. In collaborations with other investigators, Dr. Chen applied discovery-based proteomics analysis to interrogate normal cellular processes such as regulatory mechanisms of iPSCs and post-translational modifications on pluripotent stem cells as well as disease processes such as heterogeneity of tumor microenvironment and neurodegeneration.
Mass spectrometry (MS), with its qualitative and quantitative capabilities, has proven to be a power tool for biological and biomedical research. Quantitative measurements derived from the MS technology can be translated into differential expression of proteins in biological samples, and collective molecular changes can then be used to describe complex phenotypes such as disease progression. In other scenarios, quantitative measurements derived from the MS technology can be used to associate cooperative protein expression changes with cellular responses such as cell signaling. Quantitative proteomic methods fall into two general categories: label-based or label-free quantification. Labels such as isobaric chemical tags with different masses can be introduced through covalent modification to peptides. During MS/MS analysis, each isobaric tag is fragmented to produce a unique reporter ion mass that is used for quantitation1. Also, stable isotope labeled amino acids can be introduced through metabolic incorporation in cells or tissues resulting in peptides with identical electrospray behavior and provide a direct comparison of peptide intensities2. Alternatively, label-free MS quantification uses ion signal intensities acquired by mass spectrometer or the number of spectra matched to peptides from a protein as a surrogate measure to assess the amount of protein within the sample3, 4. This review summarizes the pros and cons of conventional MS quantitative strategies based on parameters such as robustness, precision (the spread of measured values), and accuracy (measured versus the expected values). An illustration that summarizes different MS quantification strategies is included Figure 1 We also include new advances of MS instrumentation and data acquisition strategies related to MS quantification.
Peptides are isolated and fragmented by tandem mass spectrometry (MS/MS) on the fly. MS/MS are used to identify the peptides, and relative quantification of peptides is based on peak area integration of Extracted Ion Chromatograms (XIC).
Quantitative values obtained from labeled quantification are less affected by the instrument variability with inclusion of labeled universal reference that allows the MS operator to maintain platform reproducibility for large studies over a long period of time. The inclusion of stable isotope-labeled (SILAC) cell or tissue (SILAM) lysate as the universal reference for quantification offers a solution to this challenge. SILAC based strategy has emerged as a powerful and versatile approach for proteome-wide quantitation by mass spectrometry. Proteins are subjected to constant turnover in replicating cells. SILAC labels the proteome by incorporating stable isotope labeled amino acids (e.g. 13C and 15N) in newly synthesized proteins to generate the “heavy” proteome 5, 6. When light (natural amino acids) and heavy cells or lysates are mixed, they are distinguishable by mass spectrometer, and protein abundances are determined from the relative mass spectra (MS1) signal intensities from assigned peptides. Alternatively, the same heavy lysate can be spiked in unlabeled lysates and serve as reference standards to derive relative quantification of proteins in unlabeled biological samples 7, 8. Using the same principle, heavy amino acids can be incorporated in live animals to generate tissue-specific universal reference for MS quantification 9, 10, 11, 12.
The SILAC quantification is relatively easy to implement in the proteomics workflow. It provides high-precision readout of quantification values. It does not require special data acquisition method and has no accuracy bias as observed in the isobaric labeling quantification (the known ratio compression effect). However, low abundance ions in SILAC quantification suffer from poor ion statistics. Also, it is not easily multiplexed for MS quantification. Nevertheless, different SILAC strategies have been developed recently to allow multiplexing quantification. A method of higher-order multiplexing was developed by combining MS1 based SILAC metabolic labels with six MS2 based TMT isobaric labels to analyze biological samples in a hyperplexing manner (18 samples in a single run)13. This method will be further described in the section of MS2 based quantification. Another method that enables the expansion of multiplexing capability of MS1 based quantification is to use SILAC embedding neutron signatures (NeuCode SILAC). Using mass spectrometers with high-resolution capability, different isotope combinations (isotopologues) of lysine can be incorporated in cells metabolically and produce from 2-plex (36mDa, 240K resolution)14 to 4-plex (12mDa, 960K resolution)15. With the current Fourier transform MS systems capable of ultra-high resolution (>1,000,000), it will allow the use of NeuCode-labeled peptides separately by as little as ~6mDa and offer 4-plex quantification. To permit high levels of MS1-based multiplexing without relying on metabolic labeling, Herbert A.S. et al. synthesized 12-plex NeuCode amine reactive tags using differential incorporation of C and N isotopes to generate isotopologues with a 6.3mDa mass defect for global proteome quantification. Results from the proof of concept experiment demonstrated comparable performance to metabolic labeling and isobaric tagging while combine the benefits of these two quantitative methods16.
Chromatograms from LC-MS/MS analyses have been used for MS quantification since the inception of the field without protein/peptide labeling. The peak area (area under the curve) and intensities of precursor ions have been used for MS quantification17. Advances of software automating the label-free MS1 quantification such as Skyline18 utilize area under the curve (AUC) for relative quantification of the same peptide from different samples and allow platform-independent quantification. Label-free MS1 quantitation is fast to implement and has lower starting cost. However, the accuracy of peak based label-free quantification relies on good chromatographic alignment (matched peptide retention time). Also, label-free MS quantification often suffers from poor technical reproducibility in general and is not amendable to multiplexing.
Instead of MS1 based quantification, one can also take advantage of the information computed from reporter ions derived from chemical tags attached to peptides or MS2 spectrum observations to estimate the abundance of a protein in different samples.
As an alternative to metabolic labeling, peptides can be labeled with chemical tags that are isobaric (the same in mass) but yields reporter ions of different mass upon fragmentation for global proteome quantification. There are two types of isobaric tags commercially available: isobaric tags for relative and absolute quantification (iTRAQ)19 and tandem mass tags (TMT)20. It presents as a valuable alternative to global proteome quantification of primary cells or human tissues, which cannot be labeled metabolically. It enables simultaneous protein identification and quantification by MS. Most importantly, it allows high levels of multiplexing (2-plex, 4-plex, 6-plex, and 10-plex).
Amine reactive tags have been developed to exploit the sparse low mass region of tandem MS2 spectra and produce reporter ions for MS quantification. Consequently, quadropole became the collision cell of choice because its ability to measure m/z in low mass region as compared to ion traps21. A CID-HCD dual scan configuration is often implemented for isobaric MS quantification22. Spectra acquired in the ion trap by collision-induced fragmentation (CID) are used for peptide identification, and spectra generated by higher-energy collision dissociation (HCD), a quadrupole-like collision cell, are used for quantification. Low-resolution ion traps are sufficient for collecting CID spectra while quantification of reporter ions benefits from high mass accuracy and resolution23.
In spite of its multiplexing advantage, isobaric labeling MS quantification suffers from an accuracy bias known as the ratio compression effect. It has been demonstrated that the accuracy and precision of isobaric MS quantification are often distorted because of co-isolation and co-fragmentation of contaminating neighboring isobaric ions with the target ions24. Instrument modifications such as the dual ion trap-orbitrap25 in the LTQ-Velos Orbitrap were used to minimize the ratio compression effect. The dual ion trap-orbitrap includes two chambers26 and an HCD (quadropole-like) cell27. This hybrid analyzer provides faster sequencing speed that improves the third fragmentation process to generate highly specific MS3 spectra in the quadrupole-like HCD cell. The whole process of shuttling ions between several collision cells (dual ion trap and high collision dissociation) is known as multi-notch MS328. Multi-notch MS3 results in improved accuracy and precision of isobaric labeling MS quantification by minimizing the low mass interference. However, MS3 acquisition suffers from reduced sensitivity due to slower acquisition speed. The newest addition of instrumentation, Orbitrap Tribrid Mass Spectrometer, allows parallelized ion fragmentation routines that will increase the speed as well as sensitivity of MS3 spectra acquisition for isobaric labeling MS quantification.
To expand the multiplexing capability of isobaric tagging, Dephoure N. et al. combined metabolic and isobaric labeling approaches and generated hyperplexing method13. The hyperplexing method utilized 3-plex MS1 SILAC quantification and 6-plex MS/MS-distinguishable TMT isobaric labels to provide MS quantification of 18 samples simultaneously in a single run. The increased multiplexing capacity of MS quantification will facilitate the inclusion of biological replications and provide statistical power to study large networks of proteins in complex systems. However, it was also noted the new generation multiplexing tags (8 plex and up) resulted in lowered number of peptide identification due to the generation of alternative label-associated ions in CID29. Therefore, cautions should be taken when developing or analyzing multiplexing experiment in large-scale shotgun proteomics.
Comparative protein quantification by mass spectrometry can be performed using tandem mass spectra (spectral counts) without labeling. There is a growing interest in using label-free LC-MS approach because it does not require separate sample preparation procedures or special set up for data acquisition to obtain quantitative information. The label-free MS2-based quantification takes advantage of MS2 spectra collected for each protein to estimate the abundance of a protein in different samples3, 30. Robust analysis of label-free MS data enables more consistent comparisons of quantitative data within and across laboratories. Since it is not amendable to multiplexing, normalization of LC-MS/MS data is necessary to perform quantitative analysis using spectra counting. Tools for label-free MS quantification have been recently reviewed31. In this review, we will briefly describe some popular approaches for MS2 based label-free MS quantification.
Normalized spectral abundance factor (NSAF) takes into account the length of the identified proteins32. Briefly, the spectral counts are divided by the protein length and then normalized again to the sum of all such ratios per experiment. A refinement of NSAF, dNSAF, corrects the assignment of spectral counts among protein isoforms and improves the accuracy of spectral count quantification over three orders of magnitude33. Another method, normalized spectral index (SIN) integrates spectra and peptide count values with intensity of fragment ions in each spectrum34. The spectral index (SI) is the sum of fragment ion intensity for each confidently identified peptide (including all its MS/MS spectra) assigned to a protein. Then, SI is normalized by the sum of ion intensities of all proteins and the length of protein to generate SIN. Although the method was originally developed using data acquired from a low-resolution linear ion trap, SIN showed a linear response over 3 orders of magnitude dynamic range.
In addition to utilizing the physical parameters acquired from label-free MS analysis, absolute protein expression measurements (APEX), correlates spectra counts with predictions of peptides for each protein to estimate protein abundance from the fraction of observed peptide mass spectra. Peptide prediction for APEX is derived from the machine learning classification algorithm modeled on a protein training set35, 36. APEX index performs two types of normalization i) by the number of unique tryptic peptides expected per protein or MS detectability and ii) by the total number of spectra. Similarly, protein abundance index (PAI) derives the ratio of observed to observable peptides of a protein to achieve relative quantification37. PAI can be also exponentially modified (emPAI) and calculated to add quantitative information to large-scale label-free MS analysis38. Lastly, intensity-based approaches can also be used to derive quantitative comparisons from label-free MS data. For example, intensity-based absolute quantification (iBAQ) computed by MaxQuant calculates protein intensities as the sum of total peptide intensities of a protein39. It has been observed though spectral count based quantification methods are associated with higher errors than MS1 peak intensity based methods40. The T3PQ method assumes that for each protein identified by a set of peptides, the average of the three most efficiently ionized and therefore highest MS signals directly correlate with the input amount of the corresponding protein41.
Overall, label-free LC-MS/MS quantification methods have gained interest and popularity because the easiness of implementation and low cost. With the development and improvement of label-free MS quantification tools, label-free MS quantification provide enough accuracy if variants are carefully controlled. However, poor reproducibility of label-free MS quantification makes it challenging for large-scale quantitative applications, especially large studies that require data acquisition over a long period of time. Furthermore, it is clear that search engine-induced bias exist in for MS2-based label-free protein quantification strategies42. Lastly, instrument-dependent variations may not be normalized properly by simply adjusting the total number of acquired spectra among the MS runs. Alternatively, quantitative values obtained from labeled quantification are less affected by the instrument variability with inclusion of labeled universal reference that allows the MS operator to maintain platform reproducibility for large studies over a long period of time. The inclusion of stable isotope-labeled (SILAC) cell or tissue (SILAM) lysate as the universal reference for quantification offers a solution to this challenge. The SILAC quantification provides high-precision readout of quantification values. It does not require special data acquisition method and has no accuracy bias as observed in the isobaric labeling quantification (the known ratio compression effect). However, low abundance ions in SILAC quantification suffer from poor ion statistics and sample multiplexing is not as easily implemented as the chemical labeling approaches (e.g. isobaric mass tags). Table 1 summarizes MS quantification methods discussed in this review to allow the reader to a make a competent choice for his own research.Table 1. Comparison of MS quantification methods.
|Method of Quantification||Multiplexing capability||Accuracy||Precision||Linear dynamic range||Cost of implementation||Easiness of implementation|
|Metabolic labeling||C13,N15 labeled amino acids, in vitro and in vivo labelingMS1-based quantification||Medium||+++||++||2 logs||medium||++|
|Chemical labeling||Isobaric mass tags, post-digestion labelingMS2-based quantification||High||++||+++||2 logs||high||+**|
|Label-free||MS1-based quantification (AUC)||No||++*||++||2-3 logs||Low||+++|
|Label-free||MS2-based quantification (Spectra counts, Ion Intensity)||No||++*||++||2-3 logs|