The authors have declared that no competing interests exist.
Defining protein-protein interactions is essential for understanding the mechanisms by which cells regulate basic functions, such as metabolism, transcription, and signal transduction. Affinity purification followed by tandem mass spectrometry (AP-MS) has application for discovery of new interactors regulating various cellular processes. Here we optimize the purification method for AP-MS and develop a simplified unbiased analytical tool, Z-score plus prey occurrence and reproducibility (ZSPORE) for data analysis. Using this pipeline we achieve a higher efficiency of AP-MS and enhanced identification of high confidence interacting proteins (HCIP) in mammalian cells. When applied to analysis of the innate immune interactome, these methods enhanced HCIP identification. In addition, we investigated the GRB2 complex, which is associated with signal transduction and cell growth. Twenty-four known GRB2 interacting proteins were identified plus 26 new GRB2 binding partners. Thus, these straightforward methods recapitulate known protein interactions, discover novel complexes, and allow mapping of protein interaction networks.
Analysis of protein-protein interaction has contributed numerous insights for understanding the regulation of antiviral defense, DNA repair, autophagy, and immune signaling pathways. Discerning how proteins interact in complex and dynamic networks is a key for dissecting the complexity of many genotype-to-phenotype relationships. Proteomics has emerged as a powerful tool to analyze multicomponent complexes formed under close to physiological conditions. Among various proteomic based methods, affinity purification followed by tandem mass spectrometry (AP-MS) has proven to be highly successful for identification of interacting proteins. Using this approach, global wide interactomes have been established in Escherichia coli
Various affinity tags have been employed for protein purification, but the FLAG and HA epitopes remain the most popular tags for AP-MS in mammalian cells. To optimize the AP-MS method, we compare purification strategies using the FLAG and HA tags. We also compare the efficacy of single versus tandem FLAG-HA purification on identification of high confidence interacting proteins (HCIP).
Unfiltered AP-MS data include many contaminating or non-specific binding proteins (NSBP). Computational tools are required for the processing of AP-MS data and elimination of NSBP. Programs such as CompPASS
The availability of high affinity monoclonal antibodies against the HA and FLAG epitope tags has led to their frequent utilization for affinity purification of protein complexes. These are the most commonly used epitopes for mapping the proteome. To compare the efficacy of these epitopes for affinity purification, FLAG and HA fusion proteins were purified in parallel. The bait proteins used for optimizing affinity purification included MDA5 and other well-known components of the innate immunity network
Single tag AP-MS and tandem affinity purification (TAP) are both broadly applied methods for protein purification. TAP is a two-step procedure requiring sequential purification using two different affinity tags. FLAG and HA double tags are most commonly applied for tandem purification of protein complexes. To compare the effect of tandem tag vs. single tag purification on the yield of total prey and HCIP, we compared protein complexes purified by single purification with FLAG vs. a two-step purification with FLAG followed by HA. To compensate for the lower binding capacity of anti-HA beads we used 4 times more anti-HA beads than anti-FLAG beads for immunoprecipitation. The number of HCIP associated with the kinase, TBK1, was determined by the algorithm ZSPORE detailed below. MS analysis revealed that TAP purified TBK1 complexes lacked several known interactors, including optineurin (data not shown). As with other screening methods, AP-MS is unable to detect all interactors. For example, neither single-step nor tandem purification of TBK1 pulled down A20, a TBK1 known interactor
As with many screening methods, unfiltered AP-MS data contain many non-specific binding proteins caused by binding to the antibody coated beads, epitope tag, aggregation, or carryover from prior MS runs. Several computational tools have been developed for processing AP-MS data to eliminate NSBP and identify HCIP
We aimed to create a simplified method for analysis of AP-MS data. Three main parameters (protein abundance, the frequency of observed protein in the database, and reproducibility) were combined to generate an algorithm. Total spectral counts (TSC) have gained acceptance as a practical, label-free, semi-quantitative measure of protein abundance for proteomic studies
We first evaluated the performance of ZSPORE using our published database of Human Innate Immunity Interactome for type I Interferon (HI5)
Our AP-MS database and the simplified computational strategy represent a valuable resource for investigators using the outlined purification procedures, especially those analyzing a small number of baits. Combining datasets will enhance the resolution of HCIP. Dissemination of the database to interested members of the research community can be arranged by contacting the authors. The ZSPORE algorithm may also be applied to large datasets involving other species and cell types.
To evaluate the ability of the ZSPORE strategy to identify novel HCIP, we applied these tools to a well-characterized adaptor protein. Growth factor receptor-bound protein 2 (GRB2) is ubiquitously expressed and plays a critical role in receptor tyrosine kinase signaling pathways
Data from 4 MS runs (replicates from 2 independent purifications) were collected and in total 166 interactors were detected (Table S3). Z-scores were calculated based on the maximum TSC of each prey among 4 independent MS runs and analyzed against our current database of 211 protein complexes (Table S2 and unpublished data). Increasing the size of the core database allows one to apply greater flexibility or stringency in selecting HCIP. 73 preys showed Z-scores higher than 2 (p< 0.05). Next we investigated prey occurrence; one prey was filtered out based on its occurrence in >7% in our database. Reproducibility of GRB2 data revealed 54 preys (79%) appeared at least twice out of 4 MS runs. Finally, 4 proteins with only one peptide hit were removed. Taken together, we identified 50 HCIPs associated with GRB2 (
Gene Name | Z-score | Prey Occurrence | Reproducibility |
GRB2 growth factor receptor-bound protein 2 | 14 | 0.60% | 100% |
ASAP1 ArfGAP with SH3 domain, ankyrin repeat and PH domain 1 | 14 | 3.70% | 100% |
BAT2 HLA-B associated transcript 2 | 10 | 1.20% | 100% |
C2orf44 chromosome 2 open reading frame 44 | 14 | 0.60% | 100% |
CBL Cas-Br-M (murine) ecotropic retroviral transforming seq | 14 | 0.60% | 100% |
CBLB Cas-Br-M (murine) ecotropic retroviral transforming seq | 6 | 1.20% | 100% |
CPSF7 cleavage and polyadenylation specific factor 7, 59kDa | 14 | 0.60% | 100% |
DIAPH1 diaphanous homolog 1 (Drosophila) | 14 | 0.60% | 100% |
DNM1 dynamin 1 | 14 | 1.00% | 100% |
DNM2 dynamin 2 | 14 | 0.60% | 100% |
DNM3 dynamin 3 | 14 | 0.40% | 75% |
DOCK1 dedicator of cytokinesis 1 | 14 | 0.30% | 50% |
ELMO2 engulfment and cell motility 2 | 14 | 0.60% | 100% |
FAM59A family with sequence similarity 59, member A | 14 | 0.60% | 100% |
GAB1 GRB2-associated binding protein 1 | 14 | 0.60% | 100% |
GAB2 GRB2-associated binding protein 2 | 11 | 3.40% | 100% |
KHDRBS1 KH domain containing, RNA binding, signal transduction associated 1 | 14 | 0.40% | 75% |
KHDRBS3 KH domain containing, RNA binding, signal transduction associated 3 | 14 | 0.30% | 50% |
KIF4A Kinesin family member 4A | 14 | 0.60% | 100% |
KIF4B kinesin family member 4B | 14 | 0.70% | 100% |
KIFC1 kinesin family member C1 | 14 | 0.60% | 100% |
MAP4K5 mitogen-activated protein kinase kinase kinase kinase 5 | 7 | 0.60% | 50% |
NISCH nischarin | 14 | 0.70% | 100% |
NKX2-5 NK2 transcription factor related, locus 5 (Drosophila) | 14 | 0.30% | 50% |
OCRL oculocerebrorenal syndrome of Lowe | 14 | 0.60% | 100% |
PIK3AP1 phosphoinositide-3-kinase adaptor protein 1 | 14 | 0.40% | 75% |
PIK3C2B phosphoinositide-3-kinase, class 2, beta polypeptide | 5 | 0.40% | 75% |
PIK3CA phosphoinositide-3-kinase, catalytic, alpha polypeptide | 13 | 0.70% | 100% |
PIK3CB phosphoinositide-3-kinase, catalytic, beta polypeptide | 14 | 1.20% | 100% |
PIK3R1 phosphoinositide-3-kinase, regulatory subunit 1 (alpha) | 8 | 4.50% | 100% |
PIK3R2 phosphoinositide-3-kinase, regulatory subunit 2 (beta) | 8 | 1.30% | 75% |
POLD1 polymerase (DNA directed), delta 1, catalytic subunit 1 | 14 | 0.60% | 100% |
PTPN11 protein tyrosine phosphatase, non-receptor type 11 | 14 | 0.70% | 100% |
PTPN23 protein tyrosine phosphatase, non-receptor type 23 | 14 | 0.60% | 100% |
PTPRA protein tyrosine phosphatase, receptor type, A | 14 | 0.70% | 100% |
RBM15 RNA binding motif protein 15 | 14 | 0.60% | 100% |
SATB1 SATB homeobox 1 | 14 | 0.60% | 100% |
SGK269 NKF3 kinase family member | 14 | 0.60% | 100% |
SH3PXD2B SH3 and PX domains 2B | 14 | 0.30% | 50% |
SHC1 Src homology 2 domain containing) transforming protein 1 | 14 | 0.40% | 75% |
SNX18 sorting nexin 18 | 14 | 0.60% | 100% |
SOS1 son of sevenless homolog 1 (Drosophila) | 14 | 0.60% | 100% |
SOS2 son of sevenless homolog 2 (Drosophila) | 14 | 0.40% | 75% |
STAMBP STAM binding protein | 14 | 0.60% | 100% |
WASL Wiskott-Aldrich syndrome-like | 10 | 4.50% | 100% |
WDR6 WD repeat domain 6 | 14 | 0.60% | 100% |
WIPF1 WAS/WASL interacting protein family, member 1 | 14 | 0.60% | 100% |
WIPF2 WAS/WASL interacting protein family, member 2 | 14 | 0.60% | 100% |
WIPF3 WAS/WASL interacting protein family, member 3 | 14 | 0.60% | 100% |
ZMYM2 zinc finger, MYM-type 2 |
Cytoscape software was used to visualize the interconnectivity of the GRB2 complex and combine the interactions into one map (
Successful identification of known interactions established the efficiency and robustness of our AP-MS pipeline. We also identified 26 new interactors associated with GRB2. For example, protein diaphanous homolog 1 (DIAPH1) is a new GRB2 partner with 11 peptide hits. DIAPH1 is involved in MEMO1-RHOA-DIAPH1 signaling pathway, which plays an important role in ERBB2-dependent stabilization of microtubules at the cell cortex
In summary, the methods described here represent a broadly applicable pipeline from affinity purification of protein complexes to statistical analysis of AP-MS data. We validated the utility of this strategy by defining an enlarged innate immune network. To further demonstrate the advantage of these tools on a well-studied protein we analyzed the GRB2 complex. In addition to 24 known interacting proteins, we also identified 26 new binding partners. The robustness of our AP-MS pipeline supports its widespread application in the characterization of protein interaction networks for various signaling pathways in different cell types.
Many processes in a cell depend on protein-protein interactions and perturbations of these interactions can lead to pathophysiology. Comprehensive knowledge of protein interaction networks will identify novel components and yield new insights on how cells respond to different environments. Ultimately such knowledge may provide new targets for therapeutic application. In conclusion, this approach to AP-MS is an invaluable tool for identification of new protein-protein interactions and mapping protein interaction networks.
HEK293 cells were purchased from ATCC. Human GRB2 cDNA was ordered from PlasmID (Dana-Farber, Boston). Antibodies specific for FLAG and HA were purchased from Sigma Chemical Co (St. Louis, MO).
MDA5-N (1-294), TBK1, NAP1, IRF3 and SINTBAD were tagged with HA-FLAG double tag as detailed elsewhere
Stable Cell Line Selection. The constructs were transfected into HEK293 cells. Transfection of plasmids was performed using Lipofectamine 2000 (Invitrogen)
Cells from four 15 cm2 plates (~4X107 cells) were collected in 10 ml TAP buffer (50 mM Tris-HCl (pH 7.5), 10 mM MgCl2, 100 mM NaCl, 0.5% Nonidet P40, 10% glycerol, phosphatase inhibitors and protease inhibitors). After shaking on ice for 30 min, cell lysates were centrifuged for 30 min at 15,000 rpm. Supernatants were collected and precleared with 50 µl of protein A/G resin. After shaking for 1 hr at 4°C, resin was removed by centrifugation. Cell lysates were added to 40 µl anti-FLAG M2 resin (Sigma) and incubated on a shaker. After 12 hr, anti-FLAG M2 resin was 3X washed (15 min) with 10 ml TAP buffer. After removing the wash buffer the resin was transferred to a spin column (Sigma) and incubated with 80 µl 3 mg/ml 3X FLAG peptide (Sigma) for 1 hr at 4°C in a shaker. Eluates were collected by centrifugation and stored at -80°C. Methods of tandem affinity purification are detailed elsewhere
Purified complexes were loaded onto a 4-15% NuPAGE gel (Invitrogen) and run for about 1cm (8 min at 200 volts). Gels were stained using the SilverQuest staining kit (Invitrogen). The entire stained area was excised as one sample and rinsed twice with 50% acetonitrile. As an alternative approach to in-gel digestion, protein mixtures can be digested in solution without prior separation. Because buffer components, such as detergents, interfere with the mass spectrometry ionization process, protein samples need to be precipitated with trichloroacetic acid (TCA), washed and re-dissolved in a digestion buffer. The main advantages of solution digestion are higher recovery of peptides compared to in-gel digestion and time savings. However, some proteins, especially like membrane proteins are resistant to re-dissolve. Therefore, we prefer in-gel digestion.
Mass spectrometry. The Taplin Biological Mass Spectrometry Facility (Harvard Medical School) was used for MS analysis. As described previously
On the day of analysis the samples were reconstituted in 5 - 10 µl of HPLC solvent A (2.5% acetonitrile, 0.1% formic acid). A nano-scale reverse-phase HPLC capillary column was created by packing 5 µm C18 spherical silica beads into a fused silica capillary (100 µm inner diameter x ~12 cm length) with a flame-drawn tip. After equilibrating the column, each sample was loaded via a Famos auto sampler (LC Packings, San Francisco CA) onto the column. A gradient was formed and peptides were eluted with increasing concentrations of solvent B (97.5% acetonitrile, 0.1% formic acid).
As peptides eluted they were subjected to electrospray ionization and then entered into an LTQ Velos ion-trap mass spectrometer (ThermoFisher, San Jose, CA). Peptides were detected, isolated, and fragmented to produce a tandem mass spectrum of specific fragment ions for each peptide. Dynamic exclusion was enabled such that ions were excluded from reanalysis for 30 s. Peptide sequences (and hence protein identity) were determined by matching protein databases with the acquired fragmentation pattern by the software program, Sequest (ThermoFisher, San Jose, CA). The human IPI database (Ver. 3.6) was used for searching. Precursor mass tolerance was set to +/- 2.0 Da and MS/MS tolerance was set to 1.0 Da. A reversed-sequence database was used to set the false discovery rate at 1%. Filtering was performed using the Sequest primary score, Xcorr and delta-Corr. Spectral matches were further manually examined and multiple identified peptides (≥ 2) per protein were required for consideration as HCIP.
(3) Prey Occurrence. We considered any prey associated with a single bait as a HCIP while preys associated with all baits as NSBP. Generally we set the bar of prey occurrence at <7%, which indicates one specific prey interacts with less than 7% of total baits in the entire database.
(4) Reproducibility. We consider at least 50% reproducibility necessary for classification of HCIP. Thus, each prey must appear in at least 2 out of 4 MS runs. To minimize the list of background contaminants observed in our dataset that were not identified by other statistical approaches, we intentionally analyzed two biological replicates .Each duplicate purified complex was analyzed twice in independent experiments.
The simple and straightforward methods of ZSPORE are easily performed using various kinds of standard office software including Excel.
Public protein interaction databases include the STRING database (protein.links.v7.1.txt.gz, found at http://string.embl.de/) and the BioGRID database (http://www.thebiogrid.org/downloads.php). The protein interaction network was generated in Cytoscape (http://www.cytoscape.org).
The methods and criteria used to remove non-specific binding proteins (NSBP) and identify high confidence interacting proteins (HCIP) include
This work was supported by NIH grants AI089829 and AI099860. S.L. is a John and Virginia Kaneb fellow.