The authors have declared that no competing interests exist.
Human proteome project was revolutionized about 40 years ago with purpose of summarizing whole proteomic data at one place. It was launched after human genome project to map and observe all proteins. The goal related proteomic study is to draft the entire human proteome in disease diagnosis by using bioinformatics tools. Pillars of human proteome project provide different databases related to proteins at transcriptional and translational level. Human proteome organization(HUPO) published biology disease HUPO whose aim is to measure protein and proteome by life and processes related to human diseases. Different human organ like plasma, liver, brain and diabetic base project are used to characterize human disease and health. Major data resources accumulated in databases like peptides Atlas, GPMDB and neXtProt for proteins. Matrices of human proteome project identify and characterize the protein products as Post translational modification (PTM), splice various isoforms from 20,300 proteins. Matrices related to different years make proteomes counterpart by magnify the research biomedical community with high output of instruments and specimen pre-analytical protocols. CALIPHO multidisciplinary group provides information about protein complexities, interactions, function and structure complexities after Uniport and Swissprot. Different bioinformatics tools are used for structural and functional annotations of protein, disease diagnosis and mutations due to protein. Extensive study of human proteome project has been proved helpful in disease treatment at translational and post- translational levels. In future, human proteome project along with bioinformatics will include protein profiling, biomarkers, Mass spectrophotometer technique and cross analysis of different proteome projects.
A large amount of data means that many problems faced in biology are now being faced in computing too. Bioinformatics, is the field that includes different techniques and softwares to examine and elucidate biological data. Two important wide-ranging activities that utilize bioinformatics are genomics and proteomics and help bioinformatics tools to predict about genes and proteins. The international HUPO was designed five years ago to characterize and evaluate so-called "missing proteins" those were confidently estimated but have not been detected at protein level yet. Currently, there are 2,563 such "missing proteins". To see the detailed distribution across chromosomes and protein existence status, HPP took help of CALIPHO; i.e. neXtProt to identify these proteins
To overcome the drawbacks of UniProt/Swiss-Prot group in 2008, a team was established known as CALIPHO. To measure these approaches, CALIPHO developed neXtProt. All the data related to human proteome is present in the data base that is neXtProt. A common development of the SIB and of Gene Bio SA. It works like the model organism database which is used for collection of data related to species and act as one of the databases which provide stimulus/input for research on model species. As like that neXtProt provide all the data for the protein present in human, to combine data correctly and develop tools that are required by the users and also provide user with high quality data and tools
HUPO with collaboration of different workgroups is making new projects to identify diseases related to human health and their identification. Approximately ~25% proteins structures, functions, localization and post translational modifications are not analyzed. So, with the passage of time new databases and softwares are developing to overcome the issue
This is a scientific based research project aimed at, drafting the whole genome of human, and characterize the structures and functions. The Human Genome Project (HGP) idea was given in 1984 by Renato Dulbecco but the work on project was started in 1990 and after a long time of 13 years, it was completed in 2003. The main funding for the project was from US government through the National institutes of health as well as from other institutions from all over the world
After innovation of HGP, the “HUPO” propel a new project that is HPP. Its ambition is to plan and perceive all proteins, translated from sequences within human organization by using three running pillars that is MS (MASS SPECTROMETER), antibody capture and bioinformatics tools as well as knowledge bases. These pillars form footing upon which chromosomes-based HPP and the biology of diseased HPP are formulated
Due to elevation in proteomics data, there is obligation to collect, store and manage data to make it congenial for scientist. Orders to depository have been in easy manners; it have to be represented in a particular format. The Human Proteome Organization “Proteomics Standard Initiatives” is furnishing such standards, implicating the instrumentations and development of different tools to make it easily available
The 12th HUPO that is Human Proteome Organization yearly event hosted by Yokohama (Japan). HUPO conference permits researchers initiative to share their capabilities. 90 participants were attracted towards the 4th workshop of Human Diabetes Proteome Project at HUPO 2013.
PPP pilot phase called the “Exploring the human plasma proteome”. PPP generate a data set for 3020 proteins which point out as more than two peptides and are totally approachable at EBI/PRIDE, ISB/peptide Atlas.
The liver project was the first start of HPP for organ and tissue.
The HUPO BPP is chaired, structured and organized by Helmut E Meyer and Joachim Klose. This project started in 2002 across the world. HUPO BPP unites some research labs offer connection and interactions with newly developed neurological field. HUPO BPP is to understand the process of brain proteins in the nervous system related disease and aging.
In 2015, large scale and targeted state study of proteome related to human brain and body fluids occured.
The 2009 study aim was to obtain better understandings of neuro-disease and aging with discovery of prognostic and diagnostic biomarkers and development of diagnostic techniques and medications
Human pilot study comprises biopsy and autopsy of human brain tissues. Most participating laboratories use Protein scape platform of bioinformatics as small local database to organize and store data. This system gives benefit to gain all the data in a well-mannered and disciplinary way
It is basically a knowledge base for identification, quantification and characterization of protein network in broad array of biological system
In 2013, HPP committee make matrices for whole proteome and chromosomes for protein-coding genes. In 2012, the violent estimation about “absent proteins” which means the neXtProt, PA, and GPMdb deduct from genes, is of 6568(33%)
Protein evidence levels are classified into five categories which are: PEI identifies and characterize the protein way to express, identification by MS, immunohistochemistry, 3Dimensional structure and amino acid sequencing. PE2 recognize transcript expression, PE3 protein provides confirmation of similar proteins in interconnected species, PE4 provide hypothesis for gene models and PE5 contain genes that have been from same level of confirmation in past
These provides chromosomes-by-chromosomes fistulous & facilitates the work of c-HPP. NeXtProt 2014 has 1,6491 PEI entries for proteins, with 19,439 protein entries from protein existence levels
Tissue based map of human proteome project was developed on 7 November 2014. It give the extensive annotation for cancer cell lines and drug abel etc isoforms & metabolism linked with protein based immune-histochemical studies of RNA sequencing resulted by 32 tissue
In 2016, progress made for protein knowledge between peptide Atlas and neXtProt and development of GPMBD mass spectrometry resources and human protein. NeXtProt is the primary source of knowledge related to HPP which is sourced from Swissprot or UniProtKB bases. Its mean that if data is updated by swiss-prot,is faithfully updated onto the neXtProt , maximum 3 to 4 times . Major data accumulated for PTMs of human proteome
The main goal of the project is to study the mechanism of biological process & human diseases. It consummates by the process of research and informational tools that tell about all the protein physiology, mutation of proteins which may be reason of any disease. There is also a correlation between research method of a specific protein and research programming on that protein. Three major conclusion are drawn by the researcher 1st is major number of proteins kept as uninvestigated, 2nd is neither knowledge related to human genome nor powerful techniques of proteomics essentials and 3rd was the pattern related to research can be effected by the obtain ability of tools.
The pathology of diabetes is the emerging issue for developing countries, the human diabetes project aim is to study better and better recognition of pathology and its all related complications. Scientific workshops and conferences are arranged maximum throughout the years to promote and share all scientific the study regarding project associated to the techniques and proposals. They are also arranged to discuss the goals of research. Different worskshops were arranged in 2013 & 2014 for the partnership and as well as other young scintist with same object in field to share their novel ideas and also findings
The 5th HBPP in April 2014 represents the 25 top list candidates biomarkers associated with diabetes and diagnosis by plasma.
The first stage diagnosis of cancer is essential for its control and. Some advance approaches such as, mammograpgy and other testing provide development for the diagnosis of cancer
Improvements in technology of genomics provide quick screen for the changes in gene expression that is converted into cancerous mass of cells. Use of ELISA system to test for disease like cancer requires single confirmation of disease. High-affinity antibody that can detect the protein of interest.
A serum sample is taken from a patient, and the proteins are attached to a chip. Mass spectrometry is implemented to achieve a proteomic image that can then be ‘read’ using bioinformatics tools. The readout can result in the early detection of cancer.
Genomic events & proteomics combined information typically using sequenced data base from DNA sequencing, RNA sequencing, or ribo-sequencing approaches. These research approaches is that if peptide are detected that cover all event like splicing junction non-coding RNA which is long & small ORF (open reading frame) can be improved.
Major data resources accumulated for proteins and for the post translational modification are peptides Atlas, GPMDB,and neXtProt. The world of post translational modification is huge more than about 200 chemical classes of post translation modification are present. Peptide Atlas perform major increase in the number of observed PTMs on the base of human phosphor proteome peptide Atlas. Two different methodology are used first of all sample were searched with potential phosphorylation on the residue S, T&Y second were processed with TPP tool & PTM. It gives the possibilities that mass modification can be present on every available site. This data set is being digested with the several proteases in the laboratory of HECK & MANN laboratory in Netherland & in Munich. It demonstrated that greater possibilities for the phosphor proteomes are present when trypsin is used alone. Total 37,771 phospho peptides are identified when its hydrolyzed by the protease. And 18,000 different phospho sites are present. Regulatory mechanism is identified during experiment in which mostly tyrosine-and serine/threonine based signaling occurs. This study shows the high quantification of mitosis or signaling factor p-tyrosine is maintained at very low level when cell signaling is absent in the cell.
neXtProt have different types of modification and O-glycosylation, sumolyation, ubiquitination, nitrosylation, methylation and recently added acetylation & ADP ribosylation. In February 2016, GPMDB published new data base for the mapping of PTMs & protein modification site & genome. Protein modification are detected by the particular nucleotide variants detected by codon are basically the remapping of the splice variants is not necessary
The divergence of highly related proteins are arising at the level of cell, tissue & subcellular localization. At the DNA RNA & proteins level complexity arise by the allelic variation due to the alternative splicing due to the post translational modification. These events cause huge population of different proteins which perform many function depends on proteins nature like cell signaling inside or between the cell to regulate the gene & for the protein complex activation. For the protein analysis, two-dimensional gel electrophoresis & some new technologies are used like mass spectrometry gives a key platform for the analysis of protein complexity.
Two contrasting approaches are used: such as bottom up & top down approaches. In the bottom up approaches protein are digested into peptides by the use of trypsin &other proteases are also used. Then used liquid chromatography (LC) & TMS (TENDOM MASS SPECTROMETRY). In top down proteomics digestion does not occur. Proteins are direct identified by the fragmentation. In literature, one finds the different terms like proteins forms, proteins iso-forms, proteins variants, but recently proteins modified forms are used. But issue is that these all are not satisfied so that iso-from are used frequently. Functional classes of proteoforms are arising by the proteolytic cleavage & generate different proteoforms with N & C terminal. There are 1,863 peptides show that 1,703 proteoforms of 921 proteins.
RNA sequence data are rapidly accumulated clinically ,it provides opportunity to find association with the mRNA isoforms variation. Statistical methods survive for the survival analysis of mRNA isoforms variation related with patient survival time. The great strength of survive on the measurement of the uncertainty of mRNA isoforms ratio in RNA-sequence data. Survival to TCGA used for ductal carcinoma & five other types of cancer types alternative splicing is a precursor complexity of proteins.95% human genes undergoes splicing it play major role in diversity. The cancer genome Atlas (TCGA) consortium generates RNA sequence Database on the 11,000-cancer patient. Breast invasive Carcinoma (BRCA) has large size sample of RNA-sequence data over 1000 patient & information about clinical like survival time, tissue subtypes& cancer stages is available for the breast invasive carcinoma patient. This large sample size of TCGA (BRCA) data allow to cause relation between genomic & transcriptome profile to clinical outcomes & patient survival times.
In 2008, after first complete manual annotation by the UniProt/Swiss-Prot group, it was believed that full set of human protein was achieved, but soon was realized that how less we know about human protein function and its characterization (PTMs, protein/protein interactions, subcellular locations, etc.). So, to gather information about what these proteins do in our body, a team was established named as CALIPHO.
To meet up these goals, CALIPHO has developed neXtProt, a human-centric protein knowledge resource. It is further working on many different experimental techniques to reveal much more about unknown proteins and their function.
About 20,300 protein-coding genes have been estimated from the analysis of the human genome. Transcriptomic analyses such as DNA microarray or RNA sequencing have manifested that these genes are expressed in a large dynamic range in the ~230 cell types that make the human body. More than fifty percent of them produces alternative splicing isoforms. During or after translation, many chemical changes of the protein products can occur (processing, post-translational modifications, etc.), resulting in a great diversity of proteoforms that differ with time, location, and physiologic or disease conditions. About one million proteoforms coexist in a single person. This variability does not take into account the inter-individual variations due to frequent polymorphisms or rare mutations. Due to recent advancement in DNA sequencing technologies, this inter-individual variability can now be examined in detail across populations. Recent progress in proteomics technologies allows detecting and quantifying proteins and their modifications with a higher accuracy. However, many proteins predicted from genomic or transcriptome analyses still are not detected, either because they were not properly evaluated, or because their expression is restricted in time and/or space, or their biophysical and chemical properties are not consistent with usual proteomics experiments. The international HUPO Human Proteome Project (HPP) was designed five years ago to try to characterize and evaluate the so-called "missing proteins" that were confidently estimated but still have not been detected at protein level. Currently, there are 2,563 such "missing proteins". To see the detailed distribution across chromosomes, go to the protein existence status, so for this purpose HPP took help of CALIPHO; i.e. neXtProt to identify these proteins. (
In the last 30 years, vast resources have been established to comprehend the molecular components and processes of human cells, for the sake of medicinal and fundamental research applications. for this purpose first target was the sequencing of the genome and the drafting of its transcriptome, it has now switched toward the study of one of the major biomolecule, the proteins. Human proteins are very complex at functional and molecular level and bioinformatics resources are needed, chiefly focused at capturing, integrating and maintaining up-to-date the available knowledge about them.
For this purpose, UniProt/swiss-prot groups were developed which provided us with an enormous amount of data about protein, according to estimation from the UniProtKB/Swiss-Prot knowledgebase content, 25% of these proteins (i.e. around 5000) have not been studied experimentally till now.
The data was distributed in multiple resources and websites, which caused a real problem so to solve this problem, neXtProt (
The main data sources (as in
MALDI-MS: | ELISA | UniProtKB |
Used to determine the level of proteolytic processing. | used for the detection of antigen in the plasma by probing it with antibodies(Ahmad, Arya et al. 2014). | 1)The UniProt Archive (UniParc) which provides a stable, comprehensive, non‐redundant sequence collection |
determine the presence of post-translational modifications, | 2)The UniProt Knowledgebase that give the central database for sequencesing of proteins with accuratally. | |
also used for the analysis of smaller molecules just like peptides.(Stults 1995). | 3)The UniProt NREF databases (UniRef) provides non‐redundant data collection. (Apweiler, Bairoch et al. 2004). | |
LC-MS/MS:used for the diagnosis of Endocrine disorders, Vitamin D analysis(Ahmad, Arya et al. 2014). | SWISS-PROTIs a data bank of accurate protein sequences, Interpretations, minimal redundancy and integration with other databases.(Junker, Contrino et al. 2000) | |
PRIDE:There are different types of data stored in PRIDE, aim of PRIDE is to reflect the author’s analysis view on the experimental data.(Vizcaíno, Côté et al. 2012) |
|
|
|
---|---|---|
Proteins/isoforms | 42196 | UniProtKB |
Binary interactions | 192822 | IntAct |
Post-translational modifications | 187531 | PeptideAtlas,UniProtKB,neXtProt |
Entries with a disease | 16671 | UniProtKB |
Entries with proteomic data | 17838 | Peptide Atlas |
Variants | 5324509 | COSMIC,UniProtKB,dbsnp |
Total publications | 104473 | All resources |
HUPO, is an international level organization which connects all the labs of proteomics which use the proteomics as a way to describe the health and mutation level of protein in the sample
Peptide Atlas gather raw outcomes from proteomics experiments and re-explain them by using a constant informatical tool such as, the Trans-Proteomic Pipeline. Peptide Atlas provides peptide determination in biological samples
All neXtProt annotations are available as XML and PEFF files on our FTP site (ftp://ftp.nextprot.org/). Our XML format has been modified to cope-up with the new phenotypic data. Changes are enlisted in a comment at the beginning of the new XSD file (version 2), also on the FTP site. The old XML files are no longer reachable due to technical problem. Annotations can also be obtained by our API at https://api.nextprot.org and our SPARQL endpoint (
The neXtProt human protein knowledgebase combine data to provide comprehensive, advanced, high quality information arranged in such a way so as to present scientists around the world with a resource that make their research easier. neXtProt is continually evolving and, in terms of content, the focus will continue to be the incorporation of new variant and proteomics data in the coming future.
As human genome is very complex and different types of protein are also present in it and they perform different function in body. These proteins are of different kind these may functional and nonfunctional. To know about protein function, structure and disorders or mutations due to protein we need a tool or software to predict these basics. These are called bioinformatical software or tools.
|
|
|
---|---|---|
Phyre2 | Analysis of protein structure, function and mutation | |
PSIPRED | Prediction of protein secondary structure | bioinf.cs.ucl.ac.uk/psipred/ |
I-TASSER | For 3D structure and protein function annotation | Zhang lab .ccmb.med,umich.edu/I-TASSER/ |
Dali server | Analysis of structured protein | ekhidna.biocenter.helsinki.fi/dali_server/start |
COFACTOR | Protein function annotation | Zhang lab .ccmb.med,umich.edu/COFACTOR/ |
JAFA server | Protein function annotation | |
SCRATCH | Annotation of Protein structure and structural features | scratch.proteomics.ics.uci.edu |
BLAT | Diagnose of diabetes |
|
Predict AD | Diagnose Alzheimer’s disease |
|
Jpred 3 | Secondary structure prediction |
|
PSIPRED server basically use the output of PSI -BLAST server to known the secondary structure of protein. Its accuracy to evaluate proteins secondary structure is 78 percent.
It is bioinformatical online software that is used for 3D structure and functional annotation of protein.
This server is basically related to the comparative analysis of newly structured protein with ancestral protein to compare their structure sequence and function.
Basically mutation is that 1 nucleotide in1000 differ from one person to other person genom. Some variation have no affect but some have genetic mutations. As all members which carry diabetes 1 has same variation in gene that translated into insulin. By this, we find mutation and then treatment is done.
This software is basically use for Alzheimer disease diagnosis. The goal of this AD projects is to identify the biomarkers from different patient data for the early diagnosis and monitoring the progressive contribution by the AD project in more objective manners.
It is basically structure based protein function annotation. Input for COFACTOR server basically require 3D structure of that particular protein whose function is necessary to be known. The output that comes by COFACTOR software is in the form of tables that show results according to our submitted proteins.
By giving high number of sequence of protein and structure there is need of very important and sophisticated prediction tool. In the recent few years there is vast and diverse set of software for the protein function annotation. JAFA server or software is also one of them that are used for protein structure annotation.
The next step is the comparison of protein profiles by the DIGE. It improves the different expression by 2-Dimensional electrophoresis, it reduce the experiment variability and allow the multivariate treatment.
Some common strategies are being used by different drugs to exert their effects on proteins. A particular Genetic instability is identified which cause the changes in protein structure, function and expression. Some drugs are designed to control or correct abnormalities, for eg, An inhibitor of BCR-ABL tyrosine kinas in CML is developed. CML is chronic myleogeneous leukemia. For the designing of some particular disease it is important to know about the bioactivity of protein that is important in biological processes. For example use of neutralizing antibodies & inhibitors of tyrosine kinas receptor to inhibit ontogenesis influence by the vascular endothelial growth factor in tumorous cells. Proteome is important condition in which cells exposed to the any specific disease processes. Therefore a large number of proteome for each cell. According to some hypothesis driven projects carefully some specific feature are selected that provide information for particular medical condition. Proteomic advantages with genomic capabilities, as genomic sequencing projects completed by the introduction of native proteomic funding resourcefulness, and allow the approaches which based on proteomics to realize their effects or potential in biomedical field.
In present cross-analysis of proteome date by organ of bio-fluid have been confirmed by various platforms. By the collective analysis of data according to primary spectra with constant criteria and bioinformatics tools easily can be compared. Via cross checking of collective analysis can improve the quality of individual analysis. Expected that HPP collaboration with the human protein quantification & detection
After success of human genome project, scientists are working on human proteome project, for protein mapping and identification, using spectrometer, antibody capturing and bioinformatics. C-HPP and B/D-HPP work together to enhance data completeness and extensiveness while B/D-HPP provides database useful for C-HPP. Hence, human proteome project (HPP) integrate whole data about human protein that can be medicinally useful to treat many diseases, by the help of bioinformatical tools and softwares for storage and analysis of data; like protein isoforms, variants produced by post translational modification and splicing. In this review article, we have overviewed different databases such as SwissProt, UniProt, PRIDE and neXtProt providing with up-to-date and high-quality data, and softwares such as I-TASSER and BLAT. These softwares and tools are being further developed to be more easy and useful for the users. In near future, new tools are even being developed with the main focus on incorporation of new variants and proteomics data.