(Search by UniProt ID, AC and keywords of gene/protein names) - eg.: CHK2_HUMAN / Chk2
Recent Update History
The updated dbPTM 2022 is coming soon.

Administrator

Time 10:00 am on 21st June

42 additional PTM types are integrated into the database.

Administrator

Time 2:00 pm on 15th Aug.

Welcome to dbPTM

dbPTM is an integrated resource for protein post-translational modifications (PTMs). Due to the importance of protein post-translational modifications (PTMs) in regulating biological processes, the dbPTM was developed as a comprehensive database by integrating experimentally verified PTMs from several databases and annotating the potential PTMs for all UniProtKB protein entries. The dbPTM has been maintained for over ten years with an attempt to provide comprehensively functional and structural analyses for post-translational modifications (PTMs). In this update, dbPTM not only integrate more experimentally validated PTMs from available databases and manual curation of literature, but also provide disease association based on non-synonymous single nucleotide polymorphisms (nsSNPs). - [Data Statistics Page]

The high-throughput deep sequencing technology has leaded the analysis of association between SNPs and diseases into a data surge in both growth and scope. This update thus integrated disease-associated nsSNPs from dbSNP based on Genome-Wide Association Studies (GWAS). The PTM substrate sites locating in a specified distance of the amino acids encoded from nsSNPs were referred to having an association with its involving diseases (Figure 1). In recent years, an increasing evidence for crosstalk between PTMs has been reported.

Although mass spectrometry (MS)-based proteomics has substantially improved our knowledge about substrate site specificity of single PTM, this neglects the fact that the crosstalk of combinatorial PTMs may act in concert in the regulation of protein function. Due to the relatively limited information about the frequency and functional relevance of PTM crosstalk, in this update, the PTM sites neighbouring with other PTM sites in a specified window length were subjected to motif discovery and functional enrichment analysis. This update confronts the current state of PTM crosstalk research and breaks the bottleneck of how proteomics may contribute to understanding PTM codes, revealing the next level of data complexity and proteomic limitation in prospective PTM research.

Summary Table of PTM Sites


All of the PTM instances were categorized by their PTM types and further grouped by the modified amino acid. The number of PTM substrate sites is provided in the following summary table.

PTM Type Ala(A)Arg(R)Asn(N)Asp(D)Cys(C)Gly(G)Glu(E)Gln(Q)His(H)Ile(I) Leu(L)Lys(K)Met(M)Phe(F)Pro(P)Ser(S)Thr(T)Trp(W)Tyr(Y)Val(V)
Phosphorylation10749151571110189214241325-5271192458426490115270814
O-linked Glycosylation-----------190--2341180413235-42-
Acetylation456355-1269922731----1634713700-3192906935-7262
Pyruvate----536----------1228----
GPI-anchor108-2346817217---------4773---
O-palmitoleoylation---------------194----
ADP-ribosylation-1668205233136-2--11---168--10-
Amidation82240185103012493591121183880132141917164117115208163444
Hydroxylation-18238523-1-17145610-1566836461001317
Blocked amino end242111515111-24-3188--1
Octanoylation---------------121---
O-palmitoylation---------------95---
Oxidation----433-----4-746--8-10--
Decanoylation---------------61---
Methylation631585111174310325785148816825212982116852744-28
Sulfation----22----------41-1128-
AMPylation---------------326-63-
UMPylation---------------33-41-
N-linked Glycosylation-201163652-----1-109---218-1
Ubiquitination-1--2------456653---1----
Biotinylation-----------95--------
Butyrylation-----------351--------
Carbamidation----22---------------
Carboxyethylation-----------6--------
Carboxylation-----------1557--------
Cholesterol ester-----30--------------
Cholesterylation---1-1--------------
Citrullination-484------------------
C-linked Glycosylation-----------------360--
Crotonylation-----------659--------
Deamidation--255----219------------
Deamination-----------192--------
Decarboxylation---1------------3---
D-glucuronoylation-----1--------------
Farnesylation----441---------------
Formation of an isopeptide bond------20948------------
Formylation-----10-----194103-------
Gamma-carboxyglutamic acid------1186-------------
Geranylgeranylation----1132---------------
Glutarylation-----------1656--------
Glutathionylation----4173---------------
Glycation-----------323--------
Hydroxyceramide ester-------4------------
Iodination------------------89-
Lactoylation-----------893--------
Lactylation-----------336--------
Lipoylation-----------601--------
Malonylation-----------12992--------
Myristoylation----11715-----56--------
N-carbamoylation1-------------------
Neddylation-----------1724--------
Nitration------------------1685-
N-palmitoylation----222818-----17--------
Phosphatidylethanolamine amidation-----92--------------
Phosphoglycerylation-----------141--------
Propionylation-----------172--------
Pupylation-----------215--------
Pyrrolidone carboxylic acid------211752------------
Pyrrolylation----472---------------
S-archaeol----12---------------
S-carbamoylation----6---------------
S-Cyanation----6---------------
S-cysteinylation----15---------------
S-diacylglycerol----2195---------------
Serotonylation-------53------------
S-linked Glycosylation----11---------------
S-nitrosylation----4655---------------
S-palmitoylation----9914---------------
S-selanylation----4---------------
Stearoylation----13---------------
Succinylation----107------24252-----1--
Sulfhydration----177---------------
Sulfoxidation------------7581-------
Sumoylation-----------11620--------
Thiocarboxylation-----95--------------
2-Hydroxyisobutyrylation-----------32392--------

PTM Analysis Resource Portal


dbPTM is updated as an integrated resource for PTMs, providing not only a comprehensive dataset of experimentally verified PTMs that are supported by the literature but also an integrative platform for accessing all available databases and tools that are associated with PTM analysis.

Databases
#Database NameDescriptionURLReference
1ADPriboDB 2.0ADP-ribosylation is a protein modification responsible for biological processes such as DNA repair, RNA regulation, cell cycle and biomolecular condensate formation. Dysregulation of ADP-ribosylation is implicated in cancer, neurodegeneration and viral infection. We developed ADPriboDB (adpribodb.leunglab.org) to facilitate studies in uncovering insights into the mechanisms and biological significance of ADP-ribosylation. ADPriboDB 2.0 serves as a one-stop repository comprising 48346 entries and 9097 ADP-ribosylated proteins, of which 6708 were newly identified since the original database release. In this updated version, we provide information regarding the sites of ADP-ribosylation in 32946 entries. The wealth of information allows us to interrogate existing databases or newly available data. For example, we found that ADP-ribosylated substrates are significantly associated with the recently identified human protein interaction networks associated with SARS-CoV-2, which encodes a conserved protein domain called macrodomain that binds and removes ADP-ribosylation. In addition, we create a new interactive tool to visualize the local context of ADP-ribosylation, such as structural and functional features as well as other post-translational modifications (e.g. phosphorylation, methylation and ubiquitination). This information provides opportunities to explore the biology of ADP-ribosylation and generate new hypotheses for experimental testing.http://adpribodb.leunglab.org/33137182
#Database NameDescriptionURLReference
1CarbonylDBMotivation: Oxidative stress and protein damage have been associated with over 200 human ailments including cancer, stroke, neuro-degenerative diseases and aging. Protein carbonylation, a chemically diverse oxidative post-translational modification, is widely considered as the biomarker for oxidative stress and protein damage. Despite their importance and extensive studies, no database/resource on carbonylated proteins/sites exists. As such information is very useful to research in biology/medicine, we have manually curated a data-resource (CarbonylDB) of experimentally-confirmed carbonylated proteins/sites. Results: The CarbonylDB currently contains 1495 carbonylated proteins and 3781 sites from 21 species, with human, rat and yeast as the top three species. We have made further analyses of these carbonylated proteins/sites and presented their occurrence and occupancy patterns. Carbonylation site data on serum albumin, in particular, provides a fine model system to understand the dynamics of oxidative protein modifications/damage. Availability and implementation: The CarbonylDB is available as a web-resource and for download at http://digbio.missouri.edu/CarbonylDB/.http://digbio.missouri.edu/CarbonylDB/29509874
#Database NameDescriptionURLReference
1GlycoEpitopeCarbohydrate chains occupy truly significant positions in various fields of life sciences and biotechnology. Recently, the wide-ranging involvement of carbohydrate chains in life sciences has been extended to such diverse functions as cell to cell recognition and communication in neuronal tissues and immune systems, pathogen recognition, sperm-egg recognition and fertilization, regulating hormonal half-lives in the blood, directing embryonic development and differentiation, and directing distribution of various cells and proteins throughout the body. A large number of polyclonal or monoclonal antibodies have been used as very important tools for analyzing expression of various carbohydrate chains and their functions. In this database, useful information on these carbohydrate antigens, i.e. glyco-epitopes, and antibodies has been assembled as a compact encyclopedia.http://www.glycoepitope.jp/
2GlycomeDBCarbohydrates are the third major class of biological macromolecules, besides proteins and DNA molecules. They are involved in numerous biological processes, among them protein folding and inter/intra cell recognition. In contrast to DNA and proteins neither a comprehensive database for carbohydrate structures nor a universal nomenclature for computational purposes exists. After the cease of funding for the Complex Carbohydrate Structure Database (CCSDB, often referred as CarbBank) in 1997, four initiatives developed independent databases with partially overlapping foci. For each database, a proprietary encoding scheme for residues and topology of the structures was designed. As a result it is virtually impossible to get an overview of all existing structures, and to compare the contents of the different databases. We have analysed all of the existing public databases and defined a sequence format based on XML (GlycoCT) capable of storing all structural information of carbohydrate sequences. We have implemented a library of parsers for the interpretation of the different encoding schemes for carbohydrates. With this library we have translated the carbohydrate sequences of all freely available databases (CFG , KEGG, GLYCOSCIENCES.de, BCSDB and Carbbank) to GlycoCT, and created a new database (GlycomeDB) containing all structures and annotations. During the process of data integration we found multiple inconsistencies in the existing databases which were corrected in collaboration with the responsible curators. With the new database, GlycomeDB, it is possible to get an overview of all carbohydrate structures in the different databases and to crosslink common structures in the different databases. Scientists are now able to search for a particular structure in the meta database and get information about the occurrence of this structure in the five carbohydrate structure databases.http://www.glycome-db.org/21045056
3UnicarbKBUniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data.http://unicarbkb.org24234447
4GLYCOSCIENCES.deThe human genome seems to encode for not more than 30,000 to 40,000 proteins. A major challenge is to understand how posttranslational events, such as glycosylation, affect the activities and functions of these proteins in health and disease. The importance of protein glycosylation is becoming widely realized through studies on protein folding, protein localization and trafficking, protein solubility, biological half-life as well as studies on cell-cell interactions. The progressing Glycomics projects will dramatically accelerate the understanding of the roles of carbohydrates in cell communication and lead to novel therapeutic approaches for treatment of human disease. The MIT's magazine of innovation (January 21 2003) has identified Glycomics as one of the top ten technologies that will change the future.http://www.glycosciences.de/16239495
5GlycoSuiteDBUniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data.http://www.unicarbkb.org/12520065
6CFGThe CFG's Glycan Structures Database offers detailed structural and chemical information for thousands of glycans, including both synthetic glycans and glycans isolated from biological sources. Each glycan structure in the database is linked to relevant entries in CFG and external databases (including primary data and information about binding proteins, where available). Links are also provided to a 3-D modeling feature, references, and other information.http://www.functionalglycomics.org/glycomics/molecule/jsp/carbohydrate/carbMoleculeHome.jsp25753711
7ProGlycProtProGlycProt (Prokaryotic Glycoproteins) is a manually curated, comprehensive repository of experimentally characterized bacterial glycoproteins and archaeal glycoproteins, generated from an exhaustive literature search. This is the focused beginning of an effort to provide concise relevant information derived from rapidly expanding literature on prokaryotic glycoproteins, their glycosylating enzyme(s), glycosylation linked genes, and genomic context thereof, in a cross-referenced manner. ProGlycProt is an extensive online collection of experimentally verified glycosites and glycoproteins of the prokaryotes. For users' benefit, the database under menu ProGlycProtdb is arranged into two sections namely, ProCGP and ProUGP. ProCGP is the main section containing characterized prokaryotic glycoproteins, defined as entries with at least one experimentally known "glycosylated residue (glycosite)". Whereas, ProUGP is the supplementary section, presenting uncharacterized prokaryotic glycoproteins, defined as entries with experimentally identified glycosylation but unidentified glycosites. The ProGlycProt has been developed with an aim to aid and advance the emerging scientific interests in understanding the mechanisms, implications, and novelties of protein glycosylation in prokaryotes that include many pathogenic as well as economically important bacterial species. A general data update policy is once in three months. Existing entries are updated in real-time.http://www.proglycprot.org/22039152
#Database NameDescriptionURLReference
1CPLACPLM (Compendium of Protein Lysine Modifications) is an online data resource specifically designed for protein lysine modifications (PLMs). The CPLM database was extended and adapted from our CPLA 1.0 (Compendium of Protein Lysine Acetylation) database (Liu et al., 2011), and the 2.0 release contains 203,972 modification events on 189,919 modified lysines in 45,748 proteins for 12 types of PLMs, including N?-lysine acetylation (Yang et al., 2007; Shahbazian et al., 2007; Smith et al., 2009), ubiquitination (Gao, et al., 2013), methylation (Chen, et al., 2006), sumoylation (Ren, et al., 2009; Xue, et al., 2006), glycation (Priego-Capote, et al., 2010), butyrylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), crotonylation (Tan, et al., 2011), malonylation (Xie, et al., 2012), propionylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), succinylation (Xie, et al., 2012; Zhang, et al., 2011), phosphoglycerylation (Moellering, R. E. and B. F. Cravatt, 2013) and prokaryotic Pupylation (Liu, et al., 2011).http://cpla.biocuckoo.org21059677
#Database NameDescriptionURLReference
1CPLMCPLM (Compendium of Protein Lysine Modifications) is an online data resource specifically designed for protein lysine modifications (PLMs). The CPLM database was extended and adapted from our CPLA 1.0 (Compendium of Protein Lysine Acetylation) database (Liu et al., 2011), and the 2.0 release contains 203,972 modification events on 189,919 modified lysines in 45,748 proteins for 12 types of PLMs, including N?-lysine acetylation (Yang et al., 2007; Shahbazian et al., 2007; Smith et al., 2009), ubiquitination (Gao, et al., 2013), methylation (Chen, et al., 2006), sumoylation (Ren, et al., 2009; Xue, et al., 2006), glycation (Priego-Capote, et al., 2010), butyrylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), crotonylation (Tan, et al., 2011), malonylation (Xie, et al., 2012), propionylation (Chen, et al., 2007; Cheng, et al., 2009; Zhang, et al., 2009), succinylation (Xie, et al., 2012; Zhang, et al., 2011), phosphoglycerylation (Moellering, R. E. and B. F. Cravatt, 2013) and prokaryotic Pupylation (Liu, et al., 2011).http://cplm.biocuckoo.org/24214993
2CrosstalkDBThis database aims to collect mass spectrometry data of multiply modified histones or histone tails. You can search, analyze and download data from this database without having to log in. Quantification can be based on either spectral counting or peak intensities. We recommend isoScale and Histone Coder for spectra validation and quantification. For details of the database, see Schwammle, V.; Aspalter, C.-M.; Sidoli, S. and Jensen, O. N. Large-scale analysis of co-existing post-translational modifications on histone tails reveals global fine-structure of crosstalk Mol Cell Proteomics, 2014, 13, 1855-1865 We encourage users to register and upload their data from mass spectrometry experiments. Registration is only formal and no private data (not even your email) will be required. After uploading your data, you will still be able to correct errors or delete selected entries. As special feature, the statistical part includes calculation of interaction patterns between different histone modifications. With this tool, it should be possible to reveal the crosstalk between multiple histone modifications. We are sure that this software is not exempt from bugs. Please send us a message (at the Impressum / Feedback page) describing your problem(s).http://crosstalkdb.bmb.sdu.dk/24741113
3dbPTMProtein modification is an extremely important post-translational regulation that adjusts the physical and chemical properties, conformation, stability and activity of a protein; thus altering protein function. Due to the high-throughput of mass spectrometry-based methods in identifying site-specific post-translational modifications (PTMs), dbPTM is updated to integrate experimental PTMs obtained from public resources as well as manually curated MS/MS peptides associated with PTMs from research articles. The new version of dbPTM aims to be an informative resource for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins. In order to investigate the substrate specificity for modification sites, a newly developed statistical method has been applied to identify the significant substrate motifs for each type of PTMs containing sufficient experimental data. According to the data statistics in dbPTM, over 60% of PTM sites are located in the functional domains of proteins. It is known that most PTMs can create binding sites for specific protein-interaction domains that work together for cellular function. Thus, this update integrates protein-protein interaction and domain-domain interaction to determine the functional association of PTM sites located in protein-interacting domains. Additionally, the information of structural topologies on transmembrane proteins is integrated in dbPTM in order to delineate the structural correlation between the reported PTM sites and transmembrane topologies. To facilitate the investigation of PTMs on transmembrane proteins, the PTM substrate sites and the structural topology are graphically represented. Also, literature information related to PTMs, orthologous conservations and substrate motifs of PTMs are also provided in the resource. Lastly, this version features an improved web interface to facilitate convenient access to the resource.http://dbptm.mbc.nctu.edu.tw/index.php23193290
4HIstomePost-translational modification (PTM) of histones is a crucial step in epigenetic regulation of a gene. N-terminal tails of histones are the most accessible regions of these peptide as they protrude from the nucleosome and possess no specific structure. These tails are subjected to various modifications such as acetylation, methylation, phosphorylation, ubiquitination etc. by the 'writers'. PTMs are believed to function in a combinatorial pattern referred to as the 'histone code'. The major function of PTMs is to either create sites for the recruitment of specific factors or modify existing sites so as to abolish previous interactions. This alters the expression states of associated loci by multiple ways thus enabling gene regulation. PTMs can recruite enzymes that can write, erase or read modifications and the repertoire of such modifiers is found to be fairly large in number (~150 different enzymes in humans). Certain modifications such as acetylation, phosphorylation, change the overall charge on basic histone proteins and thereby interfere with the histone-DNA interaction essential for nucleosome stability. In terms of molecular weight, these modifications range from light (acetylation, methylation, phosphorylation) to heavy (ubiquitination, poly ADP ribosylation). Here we include 8 different types of modifications that exist on all histone peptides. PTMs are often found to be cell cycle dependent. Role of various histone PTMs has been evaluated in many important cellular processes such as demarcating euchromatin and hetrochromatin regions, transcriptional regulation of Hox gene clusters, maintainance of stemness, cell cycle control etc. Presence or absence of certain PTMs is shown to be a hallmark of different cancers.http://www.actrec.gov.in/histome/ptm_main.php22140112
5novPTMenzySeveral attempts have been made to catalog the wealth of available information on Post-Translational Modification(PTMs) for easy retrieval and analysis. However, the tools and databases available mainly focus on modified sites or enzymes of well-known PTMs. Tools for newly discovered PTMs like AMPylation and Eliminylation or unusual PTMs like sulfation,hydroxylation,deamidation etc are not yet available. novPTMenzy is a step towards cataloging information about novel and unusual PTMs and using this information for genome mining of ezymes involved in these PTMs and understanding the pathways in which they are involved. novPTMenzy provides a database Using novPTMenzy user can search for enzymes involved in five PTMs namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation.The search tool also links the protein to closest experimentally characterized neighbor and closest structural neighbor.http://www.nii.ac.in/novptmenzy.html25931459
6ProteomeScoutProteomeScout is a database of proteins and post-translational modifications. There are two main data types in ProteomeScout. 1) Proteins: Visualize proteins or annotate your own proteins. 2) Experiments: You can load a new experiment or browse and analyze an existing experiment.https://proteomescout.wustl.edu/25414335
7PSPPhosphoSitePlus (PSP) is an online systems biology resource providing comprehensive information and tools for the study of protein post-translational modifications (PTMs) including phosphorylation, ubiquitination, acetylation and methylation. See About PhosphoSite above for more information. Please cite the following reference for this resource: Hornbeck PV, et al (2015) PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43:D512-20. [reprint]http://www.phosphosite.org/homeAction.do25514926
8PTMCodePTMCode is a resource of known and predicted functional associations between protein post-translational modifications (PTMs) within and between interacting proteins. It currently contains 316,546 modified sites from 69 different PTM types which are also propagated through ortholgs between 19 different eukaryotic species. A total of 1.6 million sites and 17 million functional associations more than 100,000 proteins can currently be explored.http://ptmcode.embl.de/25361965
9PTMfuncPTMfunc is a repository of functional predictions for protein post-translational modifications (PTMs). To find predictions for your protein of interest just search using a protein name or ID in the search box above. We rely mostly on ids from ENSEMBL but also have protein names for most species. For more info click on documentation.http://ptmfunc.com/22817900
10PTM-SDPosttranslational modifications (PTMs) define covalent and chemical modifications of protein residues. They play important roles in modulating various biological functions. Current PTM databases contain important sequence annotations but do not provide informative 3D structural resource about these modifications. Posttranslational modification structural database (PTM-SD) provides access to structurally solved modified residues, which are experimentally annotated as PTMs. It combines different PTM information and annotation gathered from other databases, e.g. Protein DataBank for the protein structures and dbPTM and PTMCuration for fine sequence annotation. PTM-SD gives an accurate detection of PTMs in structural data. PTM-SD can be browsed by PDB id, UniProt accession number, organism and classic PTM annotation. Advanced queries can also be performed, i.e. detailed PTM annotations, amino acid type, secondary structure, SCOP class classification, PDB chain length and number of PTMs by chain. Statistics and analyses can be computed on a selected dataset of PTMs. Each PTM entry is detailed in a dedicated page with information on the protein sequence, local conformation with secondary structure and Protein Blocks. PTM-SD gives valuable information on observed PTMs in protein 3D structure, which is of great interest for studying sequence-structure- function relationships at the light of PTMs, and could provide insights for comparative modeling and PTM predictions protocols. Database URL: PTM-SD can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/PTM-SD/.http://www.dsimb.inserm.fr/dsimb_tools/PTM-SD/24857970
11RedoxDBSUMMARY: Redox regulation and signaling, which are involved in various cellular processes, have become one of the research focuses in the past decade. Cysteine thiol groups are particularly susceptible to post-translational modification, and their reversible oxidation is of critical role in redox regulation and signaling. With the tremendous improvement of techniques, hundreds of redox proteins along with their redox-sensitive cysteines have been reported, and the number is still fast growing. However, until now there is no database to accommodate the rapid accumulation of information on protein oxidative modification. Here we present RedoxDB-a manually curated database for experimentally validated redox proteins. RedoxDB (version 1.0) consists of two datasets (A and B, for proteins with or without verified modified cysteines, respectively) and includes 2157 redox proteins containing 2203 cysteine residues with oxidative modification. For each modified cysteine, the exact position, modification type and flanking sequence are provided. Additional information, including gene name, organism, sequence, literature references and links to UniProt and PDB, is also supplied. The database supports several functions including data search, blast and browsing. Bulk download of the entire dataset is also available. We expect that RedoxDB will be useful for both experimental studies and computational analyses of protein oxidative modification. AVAILABILITY: The database is freely available at: http://biocomputer.bio.cuhk.edu.hk/RedoxDB.http://biocomputer.bio.cuhk.edu.hk/RedoxDB/22833525
12RESIDThe RESID Database of Protein Modifications is a comprehensive collection of annotations and structures for protein modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link post-translational modifications.http://pir.georgetown.edu/resid/12520062
13SysPTM SysPTM Version 2.0, updated June 15th, 2013. Visits: 110. SysPTM provides a systematic and sophisticated platform for proteomic PTM research, equipped not only with a knowledge base of manually curated multi-type modification data, but also with four fully developed, in-depth data mining tools. Currently, SysPTM contains data detailing 471109 experimentally determined PTM sites on 53235 proteins, covering more than 50 modification types, curated from public resources including five databases and four webservers and more than three hundred peer-reviewed mass spectrometry papers. Protein annotations including Pfam domains, KEGG pathways, GO functional classification, and ortholog groups are integrated into the database. Five online tools have been developed and incorporated, including: PTMBlast, PTMPathway, PTMPhylog, PTMCluster and PTMGO.In SysPTM, the roles of single-type and multi-type modifications can be systematically investigated in a full biological context. SysPTM could be an important contribution to modificomics research.http://lifecenter.sgst.cn/SysPTM/24705204
14topPTMtopPTM is a database that integrates experimentally verified post-translational modifications (PTMs) from available databases and research articles, and annotates the PTM sites on transmembrane proteins with structural topology. The biological effects of PTMs on transmembrane proteins include phosphorylation for signal transduction and ion transport, acetylation for structure stability, attachment of fatty acids for membrane anchoring and association, as well as the glycosylation for substrates targeting, cell-cell interactions, and viruses infection. The experimentally verified PTMs are mainly collected from public resources including dbPTM, Phospho.ELM, PhosphoSite, OGlycBase, and UbiProt. For transmembrane proteins, the information of membrane topologies is collected from TMPad, TOPDB, PDBTM, and OPM. In order to fully investigate the PTMs on transmembrane proteins, the UniProtKB protein entries containing the annotation of membrane protein and the information of membrane topology are regarded as potential transmembrane proteins. To delineate the structural correlation and consensus motif of these reported PTM sites, the topPTM database also provide structural analyses, including the membrane accessibility of PTM substrate sites, protein secondary and tertiary structures, protein domains, and cross-species conservations of each entry.http://topptm.cse.yzu.edu.tw/24302577
15VPTMdbIn viruses, posttranslational modifications (PTMs) are essential for their life cycle. Recognizing viral PTMs is very important for a better understanding of the mechanism of viral infections and finding potential drug targets. However, few studies have investigated the roles of viral PTMs in virus-human interactions using comprehensive viral PTM datasets. To fill this gap, we developed the first comprehensive viral posttranslational modification database (VPTMdb) for collecting systematic information of PTMs in human viruses and infected host cells. The VPTMdb contains 1240 unique viral PTM sites with 8 modification types from 43 viruses (818 experimentally verified PTM sites manually extracted from 150 publications and 422 PTMs extracted from SwissProt) as well as 13650 infected cells' PTMs extracted from seven global proteomics experiments in six human viruses. The investigation of viral PTM sequences motifs showed that most viral PTMs have the consensus motifs with human proteins in phosphorylation and five cellular kinase families phosphorylate more than 10 viral species. The analysis of protein disordered regions presented that more than 50% glycosylation sites of double-strand DNA viruses are in the disordered regions, whereas single-strand RNA and retroviruses prefer ordered regions. Domain-domain interaction analysis indicating potential roles of viral PTMs play in infections. The findings should make an important contribution to the field of virus-human interaction. Moreover, we created a novel sequence-based classifier named VPTMpre to help users predict viral protein phosphorylation sites. VPTMdb online web server (http://vptmdb.com:8787/VPTMdb/) was implemented for users to download viral PTM data and predict phosphorylation sites of interest.http://vptmdb.com:8787/VPTMdb/33094321
16PRISMOIDPost-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs' extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites' data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence-structural-functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.http://prismoid.erc.monash.edu/31161204
17iProteinDBPost-translational modification (PTM) serves as a regulatory mechanism for protein function, influencing their stability, interactions, activity and localization, and is critical in many signaling pathways. The best characterized PTM is phosphorylation, whereby a phosphate is added to an acceptor residue, most commonly serine, threonine and tyrosine in metazoans. As proteins are often phosphorylated at multiple sites, identifying those sites that are important for function is a challenging problem. Considering that any given phosphorylation site might be non-functional, prioritizing evolutionarily conserved phosphosites provides a general strategy to identify the putative functional sites. To facilitate the identification of conserved phosphosites, we generated a large-scale phosphoproteomics dataset from Drosophila embryos collected from six closely-related species. We built iProteinDB (https://www.flyrnai.org/tools/iproteindb/), a resource integrating these data with other high-throughput PTM datasets, including vertebrates, and manually curated information for Drosophila At iProteinDB, scientists can view the PTM landscape for any Drosophila protein and identify predicted functional phosphosites based on a comparative analysis of data from closely-related Drosophila species. Further, iProteinDB enables comparison of PTM data from Drosophila to that of orthologous proteins from other model organisms, including human, mouse, rat, Xenopus tropicalis, Danio rerio, and Caenorhabditis elegans.https://www.flyrnai.org/tools/iproteindb/30397019
18AWESOMEProtein post-translational modifications (PTMs), including phosphorylation, ubiquitination, methylation, acetylation, glycosylation et al, are very important biological processes. PTM changes in some critical genes, which may be induced by base-pair substitution, are shown to affect the risk of diseases. Recently, large-scale exome-wide association studies found that missense single nucleotide polymorphisms (SNPs) play an important role in the susceptibility for complex diseases or traits. One of the functional mechanisms of missense SNPs is that they may affect PTMs and leads to a protein dysfunction and its downstream signaling pathway disorder. Here, we constructed a database named AWESOME (A Website Exhibits SNP On Modification Event, http://www.awesome-hust.com), which is an interactive web-based analysis tool that systematically evaluates the role of SNPs on nearly all kinds of PTMs based on 20 available tools. We also provided a well-designed scoring system to compare the performance of different PTM prediction tools and help users to get a better interpretation of results. Users can search SNPs, genes or position of interest, filter with specific modifications or prediction methods, to get a comprehensive PTM change induced by SNPs. In summary, our database provides a convenient way to detect PTM-related SNPs, which may potentially be pathogenic factors or therapeutic targets.http://www.awesome-hust.com30215764
19YAAMProteins are dynamic molecules that regulate a myriad of cellular functions; these functions may be regulated by protein post-translational modifications (PTMs) that mediate the activity, localization and interaction partners of proteins. Thus, understanding the meaning of a single PTM or the combination of several of them is essential to unravel the mechanisms of protein regulation. Yeast Amino Acid Modification (YAAM) (http://yaam.ifc.unam.mx) is a comprehensive database that contains information from 121 921 residues of proteins, which are post-translationally modified in the yeast model Saccharomyces cerevisiae. All the PTMs contained in YAAM have been confirmed experimentally. YAAM database maps PTM residues in a 3D canvas for 680 proteins with a known 3D structure. The structure can be visualized and manipulated using the most common web browsers without the need for any additional plugin. The aim of our database is to retrieve and organize data about the location of modified amino acids providing information in a concise but comprehensive and user-friendly way, enabling users to find relevant information on PTMs. Given that PTMs influence almost all aspects of the biology of both healthy and diseased cells, identifying and understanding PTMs is critical in the study of molecular and cell biology. YAAM allows users to perform multiple searches, up to three modifications at the same residue, giving the possibility to explore possible regulatory mechanism for some proteins. Using YAAM search engine, we found three different PTMs of lysine residues involved in protein translation. This suggests an important regulatory mechanism for protein translation that needs to be further studied.http://yaam.ifc.unam.mx/29688347
20ActiveDriverDBDeciphering the functional impact of genetic variation is required to understand phenotypic diversity and the molecular mechanisms of inherited disease and cancer. While millions of genetic variants are now mapped in genome sequencing projects, distinguishing functional variants remains a major challenge. Protein-coding variation can be interpreted using post-translational modification (PTM) sites that are core components of cellular signaling networks controlling molecular processes and pathways. ActiveDriverDB is an interactive proteo-genomics database that uses more than 260,000 experimentally detected PTM sites to predict the functional impact of genetic variation in disease, cancer and the human population. Using machine learning tools, we prioritize proteins and pathways with enriched PTM-specific amino acid substitutions that potentially rewire signaling networks via induced or disrupted short linear motifs of kinase binding. We then map these effects to site-specific protein interaction networks and drug targets. In the 2021 update, we increased the PTM datasets by nearly 50%, included glycosylation, sumoylation and succinylation as new types of PTMs, and updated the workflows to interpret inherited disease mutations. We added a recent phosphoproteomics dataset reflecting the cellular response to SARS-CoV-2 to predict the impact of human genetic variation on COVID-19 infection and disease course. Overall, we estimate that 16-21% of known amino acid substitutions affect PTM sites among pathogenic disease mutations, somatic mutations in cancer genomes and germline variants in the human population. These data underline the potential of interpreting genetic variation through the lens of PTMs and signaling networks. The open-source database is freely available at www.ActiveDriverDB.org.www.ActiveDriverDB.org33834021
21PTMsnpHigh-throughput sequencing technologies have identified millions of genetic mutations in multiple human diseases. However, the interpretation of the pathogenesis of these mutations and the discovery of driver genes that dominate disease progression is still a major challenge. Combining functional features such as protein post-translational modification (PTM) with genetic mutations is an effective way to predict such alterations. Here, we present PTMsnp, a web server that implements a Bayesian hierarchical model to identify driver genetic mutations targeting PTM sites. PTMsnp accepts genetic mutations in a standard variant call format or tabular format as input and outputs several interactive charts of PTM-related mutations that potentially affect PTMs. Additional functional annotations are performed to evaluate the impact of PTM-related mutations on protein structure and function, as well as to classify variants relevant to Mendelian disease. A total of 4,11,574 modification sites from 33 different types of PTMs and 1,776,848 somatic mutations from TCGA across 33 different cancer types are integrated into the web server, enabling identification of candidate cancer driver genes based on PTM. Applications of PTMsnp to the cancer cohorts and a GWAS dataset of type 2 diabetes identified a set of potential drivers together with several known disease-related genes, indicating its reliability in distinguishing disease-related mutations and providing potential molecular targets for new therapeutic strategies. PTMsnp is freely available at: http://ptmsnp.renlab.org.http://ptmsnp.renlab.org33240890
22FAT-PTMPost-translational modifications (PTMs) are critical regulators of protein function, and nearly 200 different types of PTM have been identified. Advances in high-resolution mass spectrometry have led to the identification of an unprecedented number of PTM sites in numerous organisms, potentially facilitating a more complete understanding of how PTMs regulate cellular behavior. While databases have been created to house the resulting data, most of these resources focus on individual types of PTM, do not consider quantitative PTM analyses or do not provide tools for the visualization and analysis of PTM data. Here, we describe the Functional Analysis Tools for Post-Translational Modifications (FAT-PTM) database (https://bioinformatics.cse.unr.edu/fat-ptm/), which currently supports eight different types of PTM and over 49 000 PTM sites identified in large-scale proteomic surveys of the model organism Arabidopsis thaliana. The FAT-PTM database currently supports tools to visualize protein-centric PTM networks, quantitative phosphorylation site data from over 10 different quantitative phosphoproteomic studies, PTM information displayed in protein-centric metabolic pathways and groups of proteins that are co-modified by multiple PTMs. Overall, the FAT-PTM database provides users with a robust platform to share and visualize experimentally supported PTM data, develop hypotheses related to target proteins or identify emergent patterns in PTM data for signaling and metabolic pathways.https://bioinformatics.cse.unr.edu/fat-ptm/31034103
23neXtProtThe neXtProt knowledgebase (https://www.nextprot.org) is an integrative resource providing both data on human protein and the tools to explore these. In order to provide comprehensive and up-to-date data, we evaluate and add new data sets. We describe the incorporation of three new data sets that provide expression, function, protein-protein binary interaction, post-translational modifications (PTM) and variant information. New SPARQL query examples illustrating uses of the new data were added. neXtProt has continued to develop tools for proteomics. We have improved the peptide uniqueness checker and have implemented a new protein digestion tool. Together, these tools make it possible to determine which proteases can be used to identify trypsin-resistant proteins by mass spectrometry. In terms of usability, we have finished revamping our web interface and completely rewritten our API. Our SPARQL endpoint now supports federated queries. All the neXtProt data are available via our user interface, API, SPARQL endpoint and FTP site, including the new PEFF 1.0 format files. Finally, the data on our FTP site is now CC BY 4.0 to promote its reuse.https://www.nextprot.org31724716
24iPTMnetProtein post-translational modification (PTM) is an essential cellular regulatory mechanism, and disruptions in PTM have been implicated in disease. PTMs are an active area of study in many fields, leading to a wealth of PTM information in the scientific literature. There is a need for user-friendly bioinformatics resources that capture PTM information from the literature and support analyses of PTMs and their functional consequences. This chapter describes the use of iPTMnet ( http://proteininformationresource.org/iPTMnet/ ), a resource that integrates PTM information from text mining, curated databases, and ontologies and provides visualization tools for exploring PTM networks, PTM crosstalk, and PTM conservation across species. We present several PTM-related queries and demonstrate how they can be addressed using iPTMnet.http://proteininformationresource.org/iPTMnet/28150246
#Database NameDescriptionURLReference
1PubMethEpigenetics, and more specifically DNA methylation is a fast evolving research area. In almost every cancer type, each month new publications confirm the differentiated regulation of specific genes due to methylation and mention the discovery of novel methylation markers. Therefore, it would be extremely useful to have an annotated, reviewed, sorted and summarized overview of all available data. PubMeth is a cancer methylation database that includes genes that are reported to be methylated in various cancer types. A query can be based either on genes (to check in which cancer types the genes are reported as being methylated) or on cancer types (which genes are reported to be methylated in the cancer (sub) types of interest). The database is freely accessible at http://www.pubmeth.org. PubMeth is based on text-mining of Medline/PubMed abstracts, combined with manual reading and annotation of preselected abstracts. The text-mining approach results in increased speed and selectivity (as for instance many different aliases of a gene are searched at once), while the manual screening significantly raises the specificity and quality of the database. The summarized overview of the results is very useful in case more genes or cancer types are searched at the same time.http://www.pubmeth.org17932060
#Database NameDescriptionURLReference
1MYRbaseMyristoylation is a common lipid modification of proteins in Eukaryotes and their Viruses as well as some Bacteria and essential for the function of several important proteins (such as G proteins, SRC and related kinases, ADP ribosylation factors, HIV gag, HIV nef,...). The saturated 14-carbon fatty acid (Myristate) is attached most often co-translationally by the enzyme NMT (MyristoylCoA:Protein N-Myristoyltransferase) to N-terminal glycines or glycines that become N-terminal after proteolytic cleavage. Based on sequence variability of known substrate proteins, physical property profiles and structural models of NMT-substrate interactions (J Mol Biol. 2002 Apr 5;317[4]:523-40), we developed a powerful prediction tool for glycine myristoylation (J Mol Biol. 2002 Apr 5;317[4]:541-57) that is available as webserver (http://mendel.imp.univie.ac.at/myristate/) and whose sensitivity allows large-scale database runs. To facilitate selection of targets for experimental verification of our predictions, we evaluate the evolutionary conservation of the predicted myristoylation motif within close homologues (EvOluation). If a sequence is predicted to be myristoylated and the same applies to its homologues (preferably in a series of different organisms), we not only add another dimension of credibility to our prediction but derive that the lipid anchor might play an essential role for that protein's function. Such an analysis has been applied in a large-scale approach to the proteins included in the SwissProt and Genbank databases. The corresponding predicted entries and their homologues were annotated and summarized in tabular form accessible from MYRbase.http://mendel.imp.ac.at/myristate/myrbase/15003124
#Database NameDescriptionURLReference
1GlycoFishLIPID PROFILING & CELL ENGINEERING POST-DOCTORAL SCIENTIST An opening for a motivated and talented post-doctoral scientist in lipid profiling/ lipidomics and cellular engineering is available in the laboratory of Dr. Betenbaugh. Candidates should have a PhD in biochemistry, molecular biology, bioengineering, chemical engineering, or a related discipline with a strong record of publication and experience. Previous work experience in one or more of the following specialties is highly desirable: identification and quantification of various lipid classes and molecules, cell line engineering, and knowledge of biological pathway modeling. Please send email to Dr. Mike Betenbaugh at beten@jhu.edu describing your background and interest in the project.http://betenbaugh.jhu.edu/GlycoFish/21591763
2GlycoFlyLIPID PROFILING & CELL ENGINEERING POST-DOCTORAL SCIENTIST An opening for a motivated and talented post-doctoral scientist in lipid profiling/ lipidomics and cellular engineering is available in the laboratory of Dr. Betenbaugh. Candidates should have a PhD in biochemistry, molecular biology, bioengineering, chemical engineering, or a related discipline with a strong record of publication and experience. Previous work experience in one or more of the following specialties is highly desirable: identification and quantification of various lipid classes and molecules, cell line engineering, and knowledge of biological pathway modeling. Please send email to Dr. Mike Betenbaugh at beten@jhu.edu describing your background and interest in the project.http://betenbaugh.jhu.edu/GlycoFly/21480662
3GlycoProtDBGlycoProtDB is a glycoprotein database providing information of Asn (N)-glycosylated proteins and their glycosylated site(s), which were constructed by employing a bottom-up strategy using actual glycopeptide sequences identified by LC/MS-based glycoproteomic technologies. Current contents are glycoproteins identified from model organisms C.elegans and mouse (C57BL/6, male). The database is searchable using gene ID, gene name, and its description (protein name) as query. Each data sheet of glycproteins is based on a single amino acid sequence in Wormpep database for C.elegans and NCBI Refseq database for mouse. The sheet presents actually detected N-glycosylation site(s) which are displayed each capturing methods of glycopeptide subset, e.g., lectins Concanavalin A, wheat germ agglutinin (WGA), or HILIC (hydrophilic interaction chromatography), as well as potential N-glycosylation sites (NX[STC], X?P). Protein sequences, which have common glycopeptide sequence(s), are linked each other.http://jcggdb.jp/rcmg/gpdb/index.action22823882
4UniPepUnipep is a project to provide access to proteomics data from the Serum Biomarker group at the Swiss Federal Institute of Technology (ETH) in Zurich Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. In the initial phase, we provide a searchable interface to a library of putative glycopeptides, i.e. those containing a concensus NxS/T motif. The database maps peptides observed in a series of LC/MSMS experiments to a library of theoretical glycopeptides. The theoretical peptides are derived from an 'electronic' tryptic digestion of the IPI protein database (version 2.28). The observed peptides are obtained from glycocapture experiments in which whole cell lysates are covalently bound to beads which preferentially bind sugar moieties. The bound proteins are tryptically cleaved and the beads washed to remove non-glycosylated peptides, then the glycopeptides are eluted by enzymatic deglycosylation. The next phase of the project will be to bring online a similar repository of proteotypic peptides seen in a variety of LC/MSMS experiments. These will also be compared to a library of theoretical peptides which have been scored for their proteotypic potential (i.e. the likelihood of detection in such an experiment). Unipep is a project to provide access to proteomics data from the Serum Biomarker group at the Swiss Federal Institute of Technology (ETH) in Zurich Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. In the initial phase, we provide a searchable interface to a library of putative glycopeptides, i.e. those containing a concensus NxS/T motif. The database maps peptides observed in a series of LC/MSMS experiments to a library of theoretical glycopeptides. The theoretical peptides are derived from an 'electronic' tryptic digestion of the IPI protein database (version 2.28). The observed peptides are obtained from glycocapture experiments in which whole cell lysates are covalently bound to beads which preferentially bind sugar moieties. The bound proteins are tryptically cleaved and the beads washed to remove non-glycosylated peptides, then the glycopeptides are eluted by enzymatic deglycosylation. The next phase of the project will be to bring online a similar repository of proteotypic peptides seen in a variety of LC/MSMS experiments. These will also be compared to a library of theoretical peptides which have been scored for their proteotypic potential (i.e. the likelihood of detection in such an experiment). Unipep is a project to provide access to proteomics data from the Serum Biomarker group at the Swiss Federal Institute of Technology (ETH) in Zurich Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. In the initial phase, we provide a searchable interface to a library of putative glycopeptides, i.e. those containing a concensus NxS/T motif. The database maps peptides observed in a series of LC/MSMS experiments to a library of theoretical glycopeptides. The theoretical peptides are derived from an 'electronic' tryptic digestion of the IPI protein database (version 2.28). The observed peptides are obtained from glycocapture experiments in which whole cell lysates are covalently bound to beads which preferentially bind sugar moieties. The bound proteins are tryptically cleaved and the beads washed to remove non-glycosylated peptides, then the glycopeptides are eluted by enzymatic deglycosylation. The next phase of the project will be to bring online a similar repository of proteotypic peptides seen in a variety of LC/MSMS experiments. These will also be compared to a library of theoretical peptides which have been scored for their proteotypic potential (i.e. the likelihood of detection in such an experiment).http://www.unipep.org/16901351
#Database NameDescriptionURLReference
1dbOGAPIntroduction: Protein O-GlcNAcylation is an O-linked glycosylation involving attachment of beta-N-acetylglucosamine (GlcNAc) to Ser/Thr residues catalyzed by O-GlcNAc transferase (OGT) without further extension of GlcNAc, whose removal is catalyzed by O-GlcNAcase (OGA). Unlike N-linked and mucin-type O-linked glycosylation, O-GlcNAcylation occurs primarily in nucleocytoplasmic proteins, and is often dynamic and reciprocal to phosphorylation at the same or adjacent Ser/Thr residues (often mutually inhibitory). Compared to phosphorylation, the amount of research on O-GlcNAcylation has been disproportionally small. Growing evidences now suggest that O-GlcNAcylation is common and has broad roles in physiology and diseases especially through its interplay with phosphorylation, e.g., regulation of insulin signaling and roles in diabetes and neurodegenerative diseases. To facilitate research on O-GlcNAcylated proteins, we developed a database of O-GlcNAcylated proteins and sites (dbOGAP) based on experimental data curated from literature as well as from collaborating labs. The database also provides additional sequence annotations and functional information integrated from databases such as UniProt, and pathway and disease databases. Review statistics for the current version of dbOGAP (v1.0). For more, please see USHUPO 2010 abstract, presentation and the paper Jinlian Wang, Manabu Torii, Hongfang Liu, Gerald W Hart and Zhang-Zhi Hu*.dbOGAP - An Integrated Bioinformatics Resource for Protein O-GlcNAcylation. BMC Bioinformatics 2011, 12:91 .http://cbsb.lombardi.georgetown.edu/hulab/OGAP.html21466708
2O-GlcNAcAtlasO-linked ?-N-acetylglucosamine (O-GlcNAc) is a post-translational modification (i.e., O-GlcNAcylation) on serine/threonine residues of proteins. As a unique intracellular monosaccharide modification, protein O-GlcNAcylation plays important roles in almost all biochemical processes examined. Aberrant O-GlcNAcylation underlies the etiologies of a number of chronic diseases. With the tremendous improvement of techniques, thousands of proteins along with their O-GlcNAc sites have been reported. However, until now there are few databases dedicated to accommodate the rapid accumulation of such information. Thus, O-GlcNAcAtlas is created to integrate all experimentally identified O-GlcNAc sites and proteins. O-GlcNAcAtlas consists of two datasets (Dataset-I and Dataset-II, for unambiguously identified sites and ambiguously identified sites, respectively), representing a total number of 4571 O-GlcNAc modified proteins from all species studied from 1984 to Dec. 31, 2019. For each protein, comprehensive information (including species, sample type, gene symbol, modified peptides and/or modification sites, site mapping methods, and literature references) is provided. To solve the heterogeneity among the data collected from different sources, the sequence identity of these reported O-GlcNAc peptides are mapped to the UniProtKB protein entries. To our knowledge, O-GlcNAcAtlas is a highly comprehensive and rigorously curated database encapsulating all O-GlcNAc sites and proteins identified in the past 35 years. We expect that O-GlcNAcAtlas will be a useful resource to facilitate O-GlcNAc studies and computational analyses of protein O-GlcNAcylation. The public version of the web interface to the O-GlcNAcAtlas can be found at http://oglcnac.org/.http://oglcnac.org/33442735
#Database NameDescriptionURLReference
1O-GlycBaseThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/databases/OGLYCBASE/9847232
2OGPNumerous studies on cancer, biopharmaceuticals, and clinical trials have necessitated comprehensive and precise analysis of protein O-glycosylation. However, the lack of updated and convenient databases deters the storage of and reference to emerging O-glycoprotein data. To resolve this issue, an O-glycoprotein repository named OGP was established in this work. It was constructed with a collection of O-glycoprotein data from different sources. OGP contains 9354 O-glycosylation sites and 11,633 site-specific O-glycans mapping to 2133 O-glycoproteins, and it is the largest O-glycoprotein repository thus far. Based on the recorded O-glycosylation sites, an O-glycosylation site prediction tool was developed. Moreover, an OGP-based website is already available (http://www.oglyp.org/). The website comprises four specially designed and user-friendly modules: statistical analysis, database search, site prediction, and data submission. The first version of OGP repository and the website allow users to obtain various O-glycoprotein-related information, such as protein accession numbers, O-glycosylation sites, glycopeptide sequences, site-specific glycan structures, experimental methods, and potential O-glycosylation sites. O-glycosylation data mining can be performed efficiently on this website, which will greatly facilitate related studies. In addition, the database is accessible from OGP website (http://www.oglyp.org/download.php).http://www.oglyp.org/download.php33581334
#Database NameDescriptionURLReference
1dbPPTAs one of the most important and ubiquitous post-translational modifications (PTMs), protein phosphorylation regulates a broad spectrum of biological processes not only in humans but also in plants. The identification of site-specific phosphorylated substrates is fundamental for understanding the regulatory molecular mechanisms of protein phosphorylation in controlling plant growth and development. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources for protein phosphorylation in plants, including databases and predictors. We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for user feedback. Please inform Han Cheng, Wankun Deng, Dr. Zexian Liu, or Dr. Yu Xue to add, remove or update one or multiple web links below.http://dbppt.biocuckoo.org/25534750
2dbPSPAs one of the most important and ubiquitous post-translational modifications (PTMs), protein phosphorylation regulates a broad spectrum of biological processes not only in humans but also in plants. The identification of site-specific phosphorylated substrates is fundamental for understanding the regulatory molecular mechanisms of protein phosphorylation in controlling plant growth and development. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources for protein phosphorylation in plants, including databases and predictors. We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for user feedback. Please inform Han Cheng, Wankun Deng, Dr. Zexian Liu, or Dr. Yu Xue to add, remove or update one or multiple web links below.http://dbpsp.biocuckoo.org/25841437
3HPRDCOMMERCIAL ENTITIES MAY NOT USE THIS SITE WITHOUT PRIOR LICENSING AUTHORIZATION. PLEASE SEND AN E-MAIL FOR FURTHER INFORMATION ABOUT LICENSING. The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data. HPRD has been created using an object oriented database in Zope, an open source web application server, that provides versatility in query functions and allows data to be displayed dynamically. http://www.hprd.org/18988627
4LymPHOSCurrent proteomic technology is capable of producing huge amounts of analytical information, which is often difficult to manage in a comprehensive form. Curation, further annotation and public communication of proteomic data require the development of standard data formats and efficient, multimedia database structures. We have implemented a workflow for the annotation of a phosphopeptide database (LymPHOS) that includes tools for MS data filtering and phosphosite assignation, mass spectrum visualization, experimental description and accurate phosphorylation site assignation. Experimental annotations were fitted to current minimum information about a proteomics experiment guidelines. A new guideline for phosphoprotein sample preparation is also proposed. Currently, the database describes 342 phosphorylation sites mapping to more than 200 gene sequences, and it can be accessed through the net (http://www.lymphos.org).http://www.lymphos.org/19639593
5MAPResThe new version of MAPRes is an extension of old version of MAPRes to mine association rules on the basis of bio-physical and bio-chemical properties of the amino acids. Several studies have been performed to analyse primary sequence of the amino acids but analyses performed on the bases of physic-chemical property of the amino acids such as polarity and charge of the amino acids is not been considered yet. The new versio of MAPRes also facilitates users to analyze non-modified sites.http://www.imsb.edu.pk/Database.htm25258092
6P3DBP(3)DB (http://www.p3db.org/) provides a resource of protein phosphorylation data from multiple plants. The database was initially constructed with a dataset from oilseed rape, including 14,670 nonredundant phosphorylation sites from 6382 substrate proteins, representing the largest collection of plant phosphorylation data to date. Additional protein phosphorylation data are being deposited into this database from large-scale studies of Arabidopsis thaliana and soybean. Phosphorylation data from current literature are also being integrated into the P(3)DB. With a web-based user interface, the database is browsable, downloadable and searchable by protein accession number, description and sequence. A BLAST utility was integrated and a phosphopeptide BLAST browser was implemented to allow users to query the database for phosphopeptides similar to protein sequences of their interest. With the large-scale phosphorylation data and associated web-based tools, P(3)DB will be a valuable resource for both plant and nonplant biologists in the field of protein phosphorylation.http://www.p3db.org/18931372
7PepCyber:P~PEPPepArray pro is a proteomics tool to provide PepArray Layout file that contains information about peptides, peptide IDs, and the array-location of the peptides to be synthesized on chip. The Layout file is required by the synthesis of an addressable peptide microarray. Peptide microarrays (PepArrays) provide powerful proteomics technology platform for a broad range of applications in studying the interactions between protein-protein, protein-nucleic acid, and many other intermolecular interactions as signatures to cellular signaling pathways, and regulatory network activities. Such studies can be applied to not only basic research but also clinical biomedical tool development such as biomarker detection, diagnostic reagent discovery, drug development, and many more. PepArray pro supports the generation of peptide sequences containing standard or non-standard amino acids from reading user-input sequences, importing from web resources, or modifying existing peptides. Currently, phosphopeptides from the corresponding databases Phospho.ELM and PepCyber P~PEP are also supported. The designed Layout file can incorporate reference and/or control peptides for quality of synthesis and assay, generate peptide modifications, replicate the generated peptides, and provide design statistics. The designed PepArrays can be stored and archived. PepArray pro makes available of a set of catalog PepArray Layout files.http://www.pepcyber.org/PPEP/copyright.php18160410
8PHOSIDAThis database accompanies 'PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites', Florian Gnad, Shubin Ren, Juergen Cox, Jesper V Olsen, Boris Macek, Mario Oroshi, Matthias Mann (2007); Genome Biology. An update of the database is described in 'PHOSIDA 2011: the posttranslational modification database', Florian Gnad, Jeremy Gunawardena, Matthias Mann (2011); Nucleic Acids Research. Phosida allows the retrieval of phosphorylation, acetylation, and N-glycosylation data of any protein of interest. It lists posttranslational modification sites associated with particular projects and proteomes or, alternatively, displays posttranslational modifications found for any protein or protein group of interest. In addition, structural and evolutionary information on each modified protein and posttranslational modification site is integrated. Importantly, Phosida links extensive peptide information to the sites, such as several peptides implicating the same site and temporal profiles of each site in response to stimulus (e.g., EGF stimulation). http://www.phosida.com/21081558
9PhosPhAtPhosphorylation site database: The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt 3.0) contains information on Arabidopsis phosphorylation sites which were identified by mass spectrometry in large scale experiments by different research groups. Specific information about the peptide properties, their annotated biological function as well as the experimental and analytical context is given. For a majority of peptides, the actual annotated mass spectrum is displayed in interactive manner. Phosphorylation site predictor: The PhosPhAt service has a built-in plant specific phosphorylation site predictor trained on the experimental dataset for Serine, threonine and tyrosine phosphorylation (pSer, pThr, pTyr). Protein sequences or Arabidopsis AGI gene identifier can be submitted to the predictor. http://phosphat.uni-hohenheim.de/23172287
10Phospho.ELMPhospho.ELM is a database of experimentally verified phosphorylation sites in eukaryotic proteins. The current release (Version 9.0, September 2010) of Phospho.ELM contains 8,718 substrate proteins from different species covering more than 42,500 instances. Instances are fully linked to literature references. List of references to the HTP data sets.http://phospho.elm.eu.org/21062810
11Phospho3DPhospho3D is a database of three-dimensional structures of phosphorylation sites which stores information retrieved from the Phospho.ELM database and which is enriched with structural information and annotations at the residue level. The database also collects the results of a large-scale structural comparison procedure providing clues for the identification of new putative phosphorylation sites. Phospho3D 2.0 also includes P3Dscan, which allows to compare your own protein structure against the set of 3D phosphorylation sites collected in the database.http://www.phospho3d.org/20965970
12PhosphoGRIDPhosphoGRID is an online database of experimentally verified in vivo protein phosphorylation sites in the model eukaryotic organism Saccharomyces cerevisisae. The database includes results from both high throughput (HTP) MS proteomics studies in addition to phosphosites identified in low throughput (LTP) studies of individual proteins or protein complexes. The identity of specific protein kinases and phosphatases shown to regulate appearance of phosphorylations are recorded, where available, as are the function(s) of the phosphorylation, and conditions under which the modification was demonstrated to occur. The PhosphoGRID curators would appreciate comments on omissions and errors, as well as notifications of newly published or submitted data.http://www.phosphogrid.org/23674503
13PhosphoNETPhosphoNET is an open-access, online resource developed by Kinexus Bioinformatics Corporation to foster the study of cell signalling systems to advance biomedical research in academia and industry. PhosphoNET is the world's largest repository of known and predicted information on human phosphorylation sites, their evolutionary conservation and the identities of protein kinases that may target these sites. PhosphoNET presently holds data on over 950,000 known and putative phosphorylation sites (P-sites) in over 23,000 human proteins that have been collected from the scientific literature and other reputable websites. Over 19% of these phospho-sites have been experimentally validated. The rest have been predicted with a novel P-Site Predictor algorithm developed at Kinexus with academic partners at the University of British Columbia and Simon Fraser University. With the PhosphoNET Evolution module, this website also provides information about cognate proteins in over 20 other species that may share these human phospho-sites. This helps to define the most functionally important phospho-sites as these are expected to be highly conserved in nature. With the Kinase Predictor module, listings are provided for the top 50 human protein kinases that are likely to phosphorylate each of these phospho-sites using another proprietary kinase substrate prediction algorithm developed at Kinexus. Our kinase substrate predictions are based on deduced consensus phosphorylation site amino acid frequency scoring matrices that we have determined for each of ~500 different human protein kinases. The specificity matrices are generated directly from the primary amino acid sequences of the catalytic domains of these kinases, and when available, have proven to correlate strongly with substrate prediction matrices based on alignment of known substrates of these kinases. The higher the score, the better the prospect that a kinase will phosphorylate a given site. Over 30 million kinase-substrate phospho-site pairs are quantified in PhosphoNET. Kinexus Bioinformatics Corporation has the capability to test most of these putative interactions in vitro for our clients.http://www.phosphonet.ca/22165948
14PhosphoPOINTMOTIVATION: To fully understand how a protein kinase regulates biological processes, it is imperative to first identify its substrate(s) and interacting protein(s). However, of the 518 known human serine/threonine/tyrosine kinases, 35% of these have known substrates, while 14% of the kinases have identified substrate recognition motifs. In contrast, 85% of the kinases have protein-protein interaction (PPI) datasets, raising the possibility that we might reveal potential kinase-substrate pairs from these PPIs. RESULTS: PhosphoPOINT, a comprehensive human kinase interactome and phospho-protein database, is a collection of 4195 phospho-proteins with a total of 15 738 phosphorylation sites. PhosphoPOINT annotates the interactions among kinases, with their down-stream substrates and with interacting (phospho)-proteins to modulate the kinase-substrate pairs. PhosphoPOINT implements various gene expression profiles and Gene Ontology cellular component information to evaluate each kinase and their interacting (phospho)-proteins/substrates. Integration of cSNPs that cause amino acids change with the proteins with the phosphoprotein dataset reveals that 64 phosphorylation sites result in a disease phenotypes when changed; the linked phenotypes include schizophrenia and hypertension. PhosphoPOINT also provides a search function for all phospho-peptides using about 300 known kinase/phosphatase substrate/binding motifs. Altogether, PhosphoPOINT provides robust annotation for kinases, their downstream substrates and their interaction (phospho)-proteins and this should accelerate the functional characterization of kinomemediated signaling. AVAILABILITY: PhosphoPOINT can be freely accessed in http://kinase. bioinformatics.tw/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.http://kinase.bioinformatics.tw/18689816
15PhospoPepPhospoPep version 2.0 is a project to support systems biology signaling research by providing interactive interrogation of MS-derived phosphorylation data from 4 different organisms. Currently there is data from the fly (Drosophila melanogaster), human (Homo sapiens), worm (Caenorhabditis elegans), and yeast (Saccharomyces cerevisiae). The experimental data was collected and analyzed by the Aebersold group at the Swiss Federal Institute of Technology (ETH) in collaboration with the Functional Genomics Center ( FGCZ ) in Zurich, Switzerland, and the Institute for Systems Biology (ISB) in Seattle, Washington USA. The tabs below show details about the data collected from each organism, and link to this information in the database. PhosphoPep offers different software tools which allow users to browse through single proteins, through pathways, and importantly to integrate the data with information from external sources, like protein-protein interaction data. Finally all data can be readily exported e.g. for a targeted proteomics approach and the generated data can be again validated using PhosphoPep, enabling systems biology signaling research.http://www.phosphopep.org/21082442
16PhosSNPAs we are entering the age of "Personal Genomics" or "Personalized Medicine", it has been expected that the knowledge of human genetic polymorphisms and variations could provide a foundation for understanding differences in susceptibility to diseases and designing individualized therapeutic treatments (Cargill, et al., 1999; Collins, et al., 1998). Recent progresses of the International HapMap Project and similar projects (International HapMap Consortium, 2005; Frazer, et al., 2007) have provided a wealth of information detailing tens of millions human genetic variations between individuals, including copy number variations (CNVs) (Redon, et al., 2006) and single nucleotide polymorphisms (SNPs) (Hinds, et al., 2005). It was estimated that ~90% of human genetic variations are due to SNPs (Collins, et al., 1998). In particular, by changing amino acids in proteins, non-synonymous SNPs (nsSNPs) in the gene coding regions could account for nearly half of the known genetic variations linked to human inherited diseases (Stenson, et al., 2003). In this regard, numerous efforts have been contributed to elucidate how nsSNPs generate deleterious effects on the stability and function of proteins. Obviously, an nsSNP might change the physicochemical property of a wild-type amino acid to affect the protein stability and dynamics, or disrupt the interacting interface that prohibits the protein to form a complex with its partners (Kono, et al., 2008; Stitziel, et al., 2004; Uzun, et al., 2007; Yue and Moult, 2006). Alternatively, nsSNPs could also influence post-translational modifications (PTMs) of proteins (eg., phosphorylation), by changing the residue types of the target sites or key flanking amino acids (Erxleben, et al., 2006; Gentile, et al., 2008; Ryu, et al., 2009; Savas and Ozcelik, 2005; Yang, et al., 2008). Previously, the Armstrong group firstly coined the term of phosphorylopathy to describe human genetic variation that results in aberrant regulation of protein phosphorylation (Erxleben, et al., 2006; Gentile, et al., 2008).http://phossnp.biocuckoo.org/19995808
17SubPhosProtein phosphorylation is the most common post-translational modification (PTM) regulating major cellular processes such as cell division, growth, and differentiation through highly dynamic and complex signaling pathways. However, the dynamic interplay of protein phosphorylation is not occurring randomly within the cell but is rather finely orchestrated by specific kinases and phosphatases that are unevenly distributed across subcellular compartments. This spatial separation not only regulates protein phosphorylation but can also control the activity of other enzymes and the transfer of other post-translational modifications.http://bioinfo.ncu.edu.cn/SubPhos.aspx25236462
18dbPSP 2.0In prokaryotes, protein phosphorylation plays a critical role in regulating a broad spectrum of biological processes and occurs mainly on various amino acids, including serine (S), threonine (T), tyrosine (Y), arginine (R), aspartic acid (D), histidine (H) and cysteine (C) residues of protein substrates. Through literature curation and public database integration, here we reported an updated database of phosphorylation sites (p-sites) in prokaryotes (dbPSP 2.0) that contains 19,296 experimentally identified p-sites in 8,586 proteins from 200 prokaryotic organisms, which belong to 12 phyla of two kingdoms, bacteria and archaea. To carefully annotate these phosphoproteins and p-sites, we integrated the knowledge from 88 publicly available resources that covers 9 aspects, namely, taxonomy annotation, genome annotation, function annotation, transcriptional regulation, sequence and structure information, family and domain annotation, interaction, orthologous information and biological pathway. In contrast to version 1.0 (~30 MB), dbPSP 2.0 contains ~9 GB of data, with a 300-fold increased volume. We anticipate that dbPSP 2.0 can serve as a useful data resource for further investigating phosphorylation events in prokaryotes. dbPSP 2.0 is free for all users to access at: http://dbpsp.biocuckoo.cn.http://dbpsp.biocuckoo.cn32472030
19EPSDAs an important post-translational modification (PTM), protein phosphorylation is involved in the regulation of almost all of biological processes in eukaryotes. Due to the rapid progress in mass spectrometry-based phosphoproteomics, a large number of phosphorylation sites (p-sites) have been characterized but remain to be curated. Here, we briefly summarized the current progresses in the development of data resources for the collection, curation, integration and annotation of p-sites in eukaryotic proteins. Also, we designed the eukaryotic phosphorylation site database (EPSD), which contained 1 616 804 experimentally identified p-sites in 209 326 phosphoproteins from 68 eukaryotic species. In EPSD, we not only collected 1 451 629 newly identified p-sites from high-throughput (HTP) phosphoproteomic studies, but also integrated known p-sites from 13 additional databases. Moreover, we carefully annotated the phosphoproteins and p-sites of eight model organisms by integrating the knowledge from 100 additional resources that covered 15 aspects, including phosphorylation regulator, genetic variation and mutation, functional annotation, structural annotation, physicochemical property, functional domain, disease-associated information, protein-protein interaction, drug-target relation, orthologous information, biological pathway, transcriptional regulator, mRNA expression, protein expression/proteomics and subcellular localization. We anticipate that the EPSD can serve as a useful resource for further analysis of eukaryotic phosphorylation. With a data volume of 14.1 GB, EPSD is free for all users at http://epsd.biocuckoo.cn/.http://epsd.biocuckoo.cn/32008039
20qPhosTemporal and spatial protein phosphorylation dynamically orchestrates a broad spectrum of biological processes and plays various physiological and pathological roles in diseases and cancers. Recent advancements in high-throughput proteomics techniques greatly promoted the profiling and quantification of phosphoproteome. However, although several comprehensive databases have reserved the phosphorylated proteins and sites, a resource for phosphorylation quantification still remains to be constructed. In this study, we developed the qPhos (http://qphos.cancerbio.info) database to integrate and host the data on phosphorylation dynamics. A total of 3 537 533 quantification events for 199 071 non-redundant phosphorylation sites on 18 402 proteins under 484 conditions were collected through exhaustive curation of published literature. The experimental details, including sample materials, conditions and methods, were recorded. Various annotations, such as protein sequence and structure properties, potential upstream kinases and their inhibitors, were systematically integrated and carefully organized to present details about the quantified phosphorylation sites. Various browse and search functions were implemented for the user-defined filtering of samples, conditions and proteins. Furthermore, the qKinAct service was developed to dissect the kinase activity profile from user-submitted quantitative phosphoproteome data through annotating the kinase activity-related phosphorylation sites. Taken together, the qPhos database provides a comprehensive resource for protein phosphorylation dynamics to facilitate related investigations.http://qphos.cancerbio.info30380102
21dbPAFProtein phosphorylation is one of the most important post-translational modifications (PTMs) and regulates a broad spectrum of biological processes. Recent progresses in phosphoproteomic identifications have generated a flood of phosphorylation sites, while the integration of these sites is an urgent need. In this work, we developed a curated database of dbPAF, containing known phosphorylation sites in H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. pombe and S. cerevisiae. From the scientific literature and public databases, we totally collected and integrated 54,148 phosphoproteins with 483,001 phosphorylation sites. Multiple options were provided for accessing the data, while original references and other annotations were also present for each phosphoprotein. Based on the new data set, we computationally detected significantly over-represented sequence motifs around phosphorylation sites, predicted potential kinases that are responsible for the modification of collected phospho-sites, and evolutionarily analyzed phosphorylation conservation states across different species. Besides to be largely consistent with previous reports, our results also proposed new features of phospho-regulation. Taken together, our database can be useful for further analyses of protein phosphorylation in human and other model organisms. The dbPAF database was implemented in PHP + MySQL and freely available at http://dbpaf.biocuckoo.org.http://dbpaf.biocuckoo.org27010073
22Scop3PProtein phosphorylation is a key post-translational modification in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation, and functional analysis of phosphosites are therefore crucial to understand their various roles. Phosphosites are mainly analyzed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to the protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. Furthermore, these sites are put into biophysical context by annotating each phosphoprotein with per-residue structural propensity, solvent accessibility, disordered probability, and early folding information. Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites and for understanding of phosphosite structure-function relationships.https://iomics.ugent.be/scop3p32508104
#Database NameDescriptionURLReference
1PRENbasePRENbase is an annotated database of known and predicted prenylated proteins. Homologous proteins are merged into clusters. This search interface is designed to allow sophisticated queries for the experimental status of the modification (known/predicted...), exclusive or shared types of modifying enzymes (FT, GGT1, GGT2) as well as for evolutionary conservation by constraining the taxonomic distribution within these clusters or for single sequences.http://mendel.imp.ac.at/PrePS/PRENbase/17411337
#Database NameDescriptionURLReference
1iCysModAs important post-translational modifications, protein cysteine modifications (PCMs) occurring at cysteine thiol group play critical roles in the regulation of various biological processes in eukaryotes. Due to the rapid advancement of high-throughput proteomics technologies, a large number of PCM events have been identified but remain to be curated. Thus, an integrated resource of eukaryotic PCMs will be useful for the research community. In this work, we developed an integrative database for protein cysteine modifications in eukaryotes (iCysMod), which curated and hosted 108 030 PCM events for 85 747 experimentally identified sites on 31 483 proteins from 48 eukaryotes for 8 types of PCMs, including oxidation, S-nitrosylation (-SNO), S-glutathionylation (-SSG), disulfide formation (-SSR), S-sulfhydration (-SSH), S-sulfenylation (-SOH), S-sulfinylation (-SO2H) and S-palmitoylation (-S-palm). Then, browse and search options were provided for accessing the dataset, while various detailed information about the PCM events was well organized for visualization. With human dataset in iCysMod, the sequence features around the cysteine modification sites for each PCM type were analyzed, and the results indicated that various types of PCMs presented distinct sequence recognition preferences. Moreover, different PCMs can crosstalk with each other to synergistically orchestrate specific biological processes, and 37 841 PCM events involved in 119 types of PCM co-occurrences at the same cysteine residues were finally obtained. Taken together, we anticipate that the database of iCysMod would provide a useful resource for eukaryotic PCMs to facilitate related researches, while the online service is freely available at http://icysmod.omicsbio.info.http://icysmod.omicsbio.info33406221
#Database NameDescriptionURLReference
1PupDBBackground: Prokaryotic ubiquitin-like protein (Pup), the firstly identified post-translational protein modifier in prokaryotes, is an important signal for the selective degradation of proteins. Recently, large-scale proteomics technology has been applied to identify a large number of pupylated proteins. The development of a database for managing pupylated proteins and pupylation sites is important for further analyses. Description: A database named PupDB is constructed by collecting experimentally identified pupylated proteins and pupylation sites from published studies and integrating the information of pupylated proteins with corresponding structures and functional annotations. PupDB is a web-based database with tools for browses and searches of pupylated proteins and interactive displays of protein structures and pupylation sites. Conclusions: The structured and searchable database PupDB is expected to provide a useful resource for further analyzing the substrate specificity, identifying pupylated proteins in other organisms and developing computational tools for predicting pupylation sites. PupDB is freely available at http://cwtung.kmu.edu.tw/pupdbhttp://cwtung.kmu.edu.tw/pupdb22424087
#Database NameDescriptionURLReference
1dbGSHdbGSH is a database that integrates the experimentally verified cysteine S-glutathionylation (GSH) sites from multiple species. S-glutathionylation (GSH), the reversible protein post-translational modification (PTM) that generates a mixed-disulfide bond between glutathione and cysteine reside, critically regulates protein activity, stability, and redox regulation. Due to its importance in regulating oxidative/nitrosative stress and balance in cellular response, a number of methods rapidly evolve to increase the dataset of experimentally determined glutathionylation sites. However, there is currently no database dedicated to the integration of all experimentally verified S-glutathionylation sites with their characteristics, structure or functional information. Thus, the dbGSH database is created to integrate all available datasets and to provide their structural analysis. Up to December 10th 2013, the dbGSH has manually accumulated more than 2200 experimentally verified S-glutathionylated peptides from more research articles using a text mining approach. To solve the heterogeneity among the data collected from different sources, the sequence identity of these reported S-glutathionylated peptides are mapped to the UniProtKB protein entries. To delineate the structural correlation and consensus motif of these GSH sites, the dbGSH database also provides structural and functional analyses, including the motifs of substrate sites, solvent accessibility, protein secondary and tertiary structures, protein domains, and gene ontology.http://csb.cse.yzu.edu.tw/dbGSH/24790154
#Database NameDescriptionURLReference
1dbSNOProtein S-nitrosylation (SNO) is a reversible post-translational modification (PTM) and involves the covalent attachment of nitric oxide (NO) to the thiol group of cysteine (Cys) residues. Given the increasing number of proteins reported to be regulated by this modification, S-nitrosylation is considered to act, in a manner analogous to phosphorylation, as a pleiotropic regulator that elicits dual effects to regulate diverse pathophysiological processes by altering protein function, stability, and conformation change in various cancers and human disorders. Due to its importance in regulating protein functions and cell signaling, dbSNO (http://dbSNO.mbc.nctu.edu.tw) is extended as an informative resource for exploring structural environment of SNO substrate sites and regulatory networks of S-nitrosylated proteins. An increasing interest in the structural environment of PTM substrate sites motivated us to map all manually curated SNO peptides (4165 SNO sites within 2277 proteins) to PDB protein entries by sequence identity, which provides the information of spatial amino acid composition, solvent-accessible surface area, spatially neighboring amino acids, and side chain orientation for 298 substrate cysteine residues. Additionally, the annotations of protein molecular functions, biological processes, functional domains and human diseases are integrated to explore the functional and disease associations for S-nitrosoproteome. In this update, users are allowed to search a group of interested proteins/genes and the system reconstructs the S-nitrosylation regulatory network based on the information of metabolic pathways and protein-protein interactions. Most importantly, an endogenous yet pathophysiological S-nitrosoproteomic dataset from colorectal cancer patients was adopted to demonstrate that dbSNO could discover potential SNO proteins involving in the regulation of NO signaling for cancer pathways.http://140.138.144.145/~dbSNO/index.php25399423
#Database NameDescriptionURLReference
1UbiProtThe UbiProt Database project aims to summarize a significant volume of data concerning various protein substrates of ubiquitylation. Each database entry describing particular ubiquitylated protein comprises information about protein properties and sources; ubiquitylation features, including details of respective conjugation cascade; literature reference and links to related databases. All data included were experimentally obtained by research groups from around the world and can be verified using respective references.http://ubiprot.org.ru/17442109
2UbiNet 2.0Ubiquitination is an important post-translational modification, which controls protein turnover by labeling malfunctional and redundant proteins for proteasomal degradation, and also serves intriguing non-proteolytic regulatory functions. E3 ubiquitin ligases, whose substrate specificity determines the recognition of target proteins of ubiquitination, play crucial roles in ubiquitin-proteasome system. UbiNet 2.0 is an updated version of the database UbiNet. It contains 3332 experimentally verified E3-substrate interactions (ESIs) in 54 organisms and rich annotations useful for investigating the regulation of ubiquitination and the substrate specificity of E3 ligases. Based on the accumulated ESIs data, the recognition motifs in substrates for each E3 were also identified and a functional enrichment analysis was conducted on the collected substrates. To facilitate the research on ESIs with different categories of E3 ligases, UbiNet 2.0 performed strictly evidence-based classification of the E3 ligases in the database based on their mechanisms of ubiquitin transfer and substrate specificity. The platform also provides users with an interactive tool that can visualize the ubiquitination network of a group of self-defined proteins, displaying ESIs and protein-protein interactions in a graphical manner. The tool can facilitate the exploration of inner regulatory relationships mediated by ubiquitination among proteins of interest. In summary, UbiNet 2.0 is a user-friendly web-based platform that provides comprehensive as well as updated information about experimentally validated ESIs and a visualized tool for the construction of ubiquitination regulatory networks available at http://awi.cuhk.edu.cn/~ubinet/index.php.http://awi.cuhk.edu.cn/~ubinet/index.php33693667
Tools
#Tool NameDescriptionURLReference
1DeepKhibAs a novel type of post-translational modification, lysine 2-Hydroxyisobutyrylation (K hib ) plays an important role in gene transcription and signal transduction. In order to understand its regulatory mechanism, the essential step is the recognition of K hib sites. Thousands of K hib sites have been experimentally verified across five different species. However, there are only a couple traditional machine-learning algorithms developed to predict K hib sites for limited species, lacking a general prediction algorithm. We constructed a deep-learning algorithm based on convolutional neural network with the one-hot encoding approach, dubbed CNN OH . It performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the ROC curve (AUC) values for CNN OH ranged from 0.82 to 0.87 for different organisms, which is superior to the currently available K hib predictors. Moreover, we developed the general model based on the integrated data from multiple species and it showed great universality and effectiveness with the AUC values in the range of 0.79-0.87. Accordingly, we constructed the on-line prediction tool dubbed DeepKhib for easily identifying K hib sites, which includes both species-specific and general models. DeepKhib is available at http://www.bioinfogo.org/DeepKhib.http://www.bioinfogo.org/DeepKhib33015075
#Tool NameDescriptionURLReference
1DeepAcetBackground: Lysine acetylation in protein is one of the most important post-translational modifications (PTMs). It plays an important role in essential biological processes and is related to various diseases. To obtain a comprehensive understanding of regulatory mechanism of lysine acetylation, the key is to identify lysine acetylation sites. Previously, several shallow machine learning algorithms had been applied to predict lysine modification sites in proteins. However, shallow machine learning has some disadvantages. For instance, it is not as effective as deep learning for processing big data. Results: In this work, a novel predictor named DeepAcet was developed to predict acetylation sites. Six encoding schemes were adopted, including a one-hot, BLOSUM62 matrix, a composition of K-space amino acid pairs, information gain, physicochemical properties, and a position specific scoring matrix to represent the modified residues. A multilayer perceptron (MLP) was utilized to construct a model to predict lysine acetylation sites in proteins with many different features. We also integrated all features and implemented the feature selection method to select a feature set that contained 2199 features. As a result, the best prediction achieved 84.95% accuracy, 83.45% specificity, 86.44% sensitivity, 0.8540 AUC, and 0.6993 MCC in a 10-fold cross-validation. For an independent test set, the prediction achieved 84.87% accuracy, 83.46% specificity, 86.28% sensitivity, 0.8407 AUC, and 0.6977 MCC. Conclusion: The predictive performance of our DeepAcet is better than that of other existing methods. DeepAcet can be freely downloaded from https://github.com/Sunmile/DeepAcet .https://github.com/Sunmile/DeepAcet30674277
#Tool NameDescriptionURLReference
1DeepRMethylSiteMethylation, which is one of the most prominent post-translational modifications on proteins, regulates many important cellular functions. Though several model-based methylation site predictors have been reported, all existing methods employ machine learning strategies, such as support vector machines and random forest, to predict sites of methylation based on a set of "hand-selected" features. As a consequence, the subsequent models may be biased toward one set of features. Moreover, due to the large number of features, model development can often be computationally expensive. In this paper, we propose an alternative approach based on deep learning to predict arginine methylation sites. Our model, which we termed DeepRMethylSite, is computationally less expensive than traditional feature-based methods while eliminating potential biases that can arise through features selection. Based on independent testing on our dataset, DeepRMethylSite achieved efficiency scores of 68%, 82% and 0.51 with respect to sensitivity (SN), specificity (SP) and Matthew's correlation coefficient (MCC), respectively. Importantly, in side-by-side comparisons with other state-of-the-art methylation site predictors, our method performs on par or better in all scoring metrics tested.https://github.com/dukkakc/DeepRMethylSite32555810
#Tool NameDescriptionURLReference
1MDD-carbBackground: Carbonylation, which takes place through oxidation of reactive oxygen species (ROS) on specific residues, is an irreversibly oxidative modification of proteins. It has been reported that the carbonylation is related to a number of metabolic or aging diseases including diabetes, chronic lung disease, Parkinson's disease, and Alzheimer's disease. Due to the lack of computational methods dedicated to exploring motif signatures of protein carbonylation sites, we were motivated to exploit an iterative statistical method to characterize and identify carbonylated sites with motif signatures. Results: By manually curating experimental data from research articles, we obtained 332, 144, 135, and 140 verified substrate sites for K (lysine), R (arginine), T (threonine), and P (proline) residues, respectively, from 241 carbonylated proteins. In order to examine the informative attributes for classifying between carbonylated and non-carbonylated sites, multifarious features including composition of twenty amino acids (AAC), composition of amino acid pairs (AAPC), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM) were investigated in this study. Additionally, in an attempt to explore the motif signatures of carbonylation sites, an iterative statistical method was adopted to detect statistically significant dependencies of amino acid compositions between specific positions around substrate sites. Profile hidden Markov model (HMM) was then utilized to train a predictive model from each motif signature. Moreover, based on the method of support vector machine (SVM), we adopted it to construct an integrative model by combining the values of bit scores obtained from profile HMMs. The combinatorial model could provide an enhanced performance with evenly predictive sensitivity and specificity in the evaluation of cross-validation and independent testing. Conclusion: This study provides a new scheme for exploring potential motif signatures at substrate sites of protein carbonylation. The usefulness of the revealed motifs in the identification of carbonylated sites is demonstrated by their effective performance in cross-validation and independent testing. Finally, these substrate motifs were adopted to build an available online resource (MDD-Carb, http://csb.cse.yzu.edu.tw/MDDCarb/ ) and are also anticipated to facilitate the study of large-scale carbonylated proteomes.http://csb.cse.yzu.edu.tw/MDDCarb/29322938
2iCar-PseCpCarbonylation is a posttranslational modification (PTM or PTLM), where a carbonyl group is added to lysine (K), proline (P), arginine (R), and threonine (T) residue of a protein molecule. Carbonylation plays an important role in orchestrating various biological processes but it is also associated with many diseases such as diabetes, chronic lung disease, Parkinson's disease, Alzheimer's disease, chronic renal failure, and sepsis. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of K, P, R, or T, which ones can be carbonylated, and which ones cannot? To address this problem, we have developed a predictor called iCar-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition, and balancing out skewed training dataset by Monte Carlo sampling to expand positive subset. Rigorous target cross-validations on a same set of carbonylation-known proteins indicated that the new predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iCar-PseCp has been established at http://www.jci-bioinfo.cn/iCar-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics.http://www.jci-bioinfo.cn/iCar-PseCp27153555
3iCarPSMotivation: Protein carbonylation is one of the most important oxidative stress-induced post-translational modifications, which is generally characterized as stability, irreversibility and relative early formation. It plays a significant role in orchestrating various biological processes and has been already demonstrated to be related to many diseases. However, the experimental technologies for carbonylation sites identification are not only costly and time consuming, but also unable of processing a large number of proteins at a time. Thus, rapidly and effectively identifying carbonylation sites by computational methods will provide key clues for the analysis of occurrence and development of diseases. Results: In this study, we developed a predictor called iCarPS to identify carbonylation sites based on sequence information. A novel feature encoding scheme called residues conical coordinates combined with their physicochemical properties was proposed to formulate carbonylated protein and non-carbonylated protein samples. To remove potential redundant features and improve the prediction performance, a feature selection technique was used. The accuracy and robustness of iCarPS were proved by experiments on training and independent datasets. Comparison with other published methods demonstrated that the proposed method is powerful and could provide powerful performance for carbonylation sites identification. Availability and implementation: Based on the proposed model, a user-friendly webserver and a software package were constructed, which can be freely accessed at http://lin-group.cn/server/iCarPS.http://lin-group.cn/server/iCarPS32766811
#Tool NameDescriptionURLReference
1CKSAAP_CrotSiteAs one of the most important and common histones post-translational modifications, crotonylation plays a key role in regulating various biological processes. The accurate identification of crotonylation sites is crucial to elucidate the underlying molecular mechanisms of crotonylation. In this study, a novel bioinformatics tool named CKSAAP_CrotSite is developed to predict crotonylation sites. The highlight of CKSAAP_CrotSite is to adopt the composition of k-spaced amino acid pairs as input encoding, and the support vector machine is employed as the classifier. As illustrated by jackknife test, CKSAAP_CrotSite achieves a promising performance with a Sensitivity of 92.45%, a Specificity of 99.17%, an Accuracy of 98.11% and a Matthew's correlation coefficient of 0.9283, which is much better than those of the existing prediction methods. Feature analysis shows that some amino acid pairs such as 'KxG', 'KG' and 'PxP' may play an important role in the prediction of crotonylation sites. The results of analysis and prediction could offer useful information for elucidating the molecular mechanisms of crotonylation and related experimental validations. A user-friendly web-server for CKSAAP_CrotSite is available at 123.206.31.171/CKSAAP_CrotSite/.123.206.31.171/CKSAAP_CrotSite/28886434
2iCrotoK-PseAACAmong different post-translational modifications (PTMs), one of the most important one is the lysine crotonylation in proteins. Its importance cannot be undermined related to different diseases and essential biological practice. The key step for finding the hidden mechanisms of crotonylation along with their occurrence sites is to completely apprehend the mechanism behind this biological process. In previously reported studies, researchers have used different techniques, like position weighted matrix (PWM), support vector machine (SVM), k nearest neighbors (KNN), and many others. However, the maximum prediction accuracy achieved was not such high. To address this, herein, we propose an improved predictor for lysine crotonylation sites named iCrotoK-PseAAC, in which we have incorporated various position and composition relative features along with statistical moments into PseAAC. The results of self-consistency testing were 100% accurate, while the 10-fold cross validation gave 99.0% accuracy. Based on the validation and comparison of model, it is concluded that the iCrotoK-PseAAC is more accurate than the previously proposed models.31751380
3Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian.33235255
#Tool NameDescriptionURLReference
1iPreny-PseAACBackground: Occurring at the cysteine residue in the C-terminal of a protein, prenylation is a special kind of post-translational modification (PTM), which may play a key role for statin in altering immune function. Therefore, knowledge of the prenylation sites in proteins is important for drug development as well as for in-depth understanding the biological process concerned. Objective: Given a query protein whose C-terminal contains some cysteine residues, which one can be of prenylation or none of them can be prenylated? Methods: To address this problem, we have developed a new predictor, called "iPreny-PseAAC", by incorporating two tiers of sequence pair coupling effects into the general form of PseAAC (pseudo amino acid composition). Results: It has been observed by four different cross-validation approaches that all the important indexes in reflecting its prediction quality are quite high and fully consistent to each other. Conclusion: It is anticipated that the iPreny-PseAAC predictor holds very high potential to become a useful high throughput tool in identifying protein C-terminal cysteine prenylation sites and the other relevant areas. To maximize the convenience for most experimental biologists, the webserver for the new predictor has been established at http://app.aporc.org/iPreny-PseAAC/, by which users can easily get their desired results without needing to go through the mathematical details involved in this paper.http://app.aporc.org/iPreny-PseAAC/28425870
#Tool NameDescriptionURLReference
1MDDGlutarBackground: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. Results: The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). Conclusions: The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/ .http://csb.cse.yzu.edu.tw/MDDGlutar/30717647
2iGlu-LysAs one of the new posttranslational modification, lysine glutarylation has been identified in both prokaryotic and eukaryotic cells. These glutarylated proteins are involved in various cellular functions, such as translation, metabolism, and exhibited diverse subcellular localizations. Experimental identification of lysine glutarylation sites was founded in 2014 and also identified its deglutarylase sirturn 5(SIRT 5). Computational prediction of lysine glutarylation could be a complementary way to the experimental technique. In this work, the lysine glutarylation predictor iGlu-Lys has been developed based on the machine learning scheme. We have selected the best feature scheme which took the amino acid pair order and special-position information into account from four constructions. The machine learning algorithm support vector machine has been adopted and its performance has been measured for different window length of peptides. In the 10-fold cross-validation with window length 19, the AUC and MCC were 0.8944 and 0.5098, respectively. Different ROC curves in 6-, 8-, and 10-fold cross-validations were very close which illustrated the robustness of our predictor. The results of iGLu-Lys were better than the existing method GlutPred. Meanwhile, a free webserver for iGlu-Lys is accessible at http://app.aporc.org/iGlu-Lys/.http://app.aporc.org/iGlu-Lys/29994125
3RF-GlutarySiteGlutarylation, which is a newly identified posttranslational modification that occurs on lysine residues, has recently emerged as an important regulator of several metabolic and mitochondrial processes. However, the specific sites of modification on individual proteins, as well as the extent of glutarylation throughout the proteome, remain largely uncharacterized. Though informative, proteomic approaches based on mass spectrometry can be expensive, technically challenging and time-consuming. Therefore, the ability to predict glutarylation sites from protein primary sequences can complement proteomics analyses and help researchers study the characteristics and functional consequences of glutarylation. To this end, we used Random Forest (RF) machine learning strategies to identify the physiochemical and sequence-based features that correlated most substantially with glutarylation. We then used these features to develop a novel method to predict glutarylation sites from primary amino acid sequences using RF. Based on 10-fold cross-validation, the resulting algorithm, termed 'RF-GlutarySite', achieved efficiency scores of 75%, 81%, 68% and 0.50 with respect to accuracy (ACC), sensitivity (SN), specificity (SP) and Matthew's correlation coefficient (MCC), respectively. Likewise, using an independent test set, RF-GlutarySite exhibited ACC, SN, SP and MCC scores of 72%, 73%, 70% and 0.43, respectively. Results using both 10-fold cross validation and an independent test set were on par with or better than those achieved by existing glutarylation site predictors. Notably, RF-GlutarySite achieved the highest SN score among available glutarylation site prediction tools. Consequently, our method has the potential to uncover new glutarylation sites and to facilitate the discovery of relationships between glutarylation and well-known lysine modifications, such as acetylation, methylation and SUMOylation, as well as a number of recently identified lysine modifications, such as malonylation and succinylation.31025681
4PUL-GLUBackground: As a new type of protein acylation modification, lysine glutarylation has been found to play a crucial role in metabolic processes and mitochondrial functions. To further explore the biological mechanisms and functions of glutarylation, it is significant to predict the potential glutarylation sites. In the existing glutarylation site predictors, experimentally verified glutarylation sites are treated as positive samples and non-verified lysine sites as the negative samples to train predictors. However, the non-verified lysine sites may contain some glutarylation sites which have not been experimentally identified yet. Methods: In this study, experimentally verified glutarylation sites are treated as the positive samples, whereas the remaining non-verified lysine sites are treated as unlabeled samples. A bioinformatics tool named PUL-GLU was developed to identify glutarylation sites using a positive-unlabeled learning algorithm. Results: Experimental results show that PUL-GLU significantly outperforms the current glutarylation site predictors. Therefore, PUL-GLU can be a powerful tool for accurate identification of protein glutarylation sites. Conclusion: A user-friendly web-server for PUL-GLU is available at http://bioinform.cn/pul_glu/.http://bioinform.cn/pul_glu/.33071614
#Tool NameDescriptionURLReference
1GlyStructBackground: Glycation is a one of the post-translational modifications (PTM) where sugar molecules and residues in protein sequences are covalently bonded. It has become one of the clinically important PTM in recent times attributed to many chronic and age related complications. Being a non-enzymatic reaction, it is a great challenge when it comes to its prediction due to the lack of significant bias in the sequence motifs. Results: We developed a classifier, GlyStruct based on support vector machine, to predict glycated and non-glycated lysine residues using structural properties of amino acid residues. The features used were secondary structure, accessible surface area and the local backbone torsion angles. For this work, a benchmark dataset was extracted containing 235 glycated and 303 non-glycated lysine residues. GlyStruct demonstrated improved performance of approximately 10% in comparison to benchmark method of Gly-PseAAC. The performance for GlyStruct on the metrics, sensitivity, specificity, accuracy and Mathew's correlation coefficient were 0.7013, 0.7989, 0.7562, and 0.5065, respectively for 10-fold cross-validation. Conclusion: Glycation has emerged to be one of the clinically important PTM of proteins in recent times. Therefore, the development of computational tools become necessary to predict glycation, which could help medical professionals administer drugs and manage patients more effectively. The proposed predictor manages to classify glycated and non-glycated lysine residues with promising results consistently on various cross-validation schemes and outperforms other state of the art methods.30717650
2PredGlyMotivation: Protein glycation is a familiar post-translational modification (PTM) which is a two-step non-enzymatic reaction. Glycation not only impairs the function but also changes the characteristics of the proteins so that it is related to many human diseases. It is still much more difficult to systematically detect glycation sites due to the glycated residues without crucial patterns. Computational approaches, which can filter supposed sites prior to experimental verification, can extremely increase the efficiency of experiment work. However, the previous lysine glycation prediction method uses a small number of training datasets. Hence, the model is not generalized or pervasive. Results: By searching from a new database, we collected a large dataset in Homo sapiens. PredGly, a novel software, can predict lysine glycation sites for H.sapiens, which was developed by combining multiple features. In addition, XGboost was adopted to optimize feature vectors and to improve the model performance. Through comparing various classifiers, support vector machine achieved an optimal performance. On the basis of a new independent test set, PredGly outperformed other glycation tools. It suggests that PredGly can provide more instructive guidance for further experimental research of lysine glycation. Availability and implementation: https://github.com/yujialinncu/PredGly.https://github.com/yujialinncu/PredGly.30590442
#Tool NameDescriptionURLReference
1EnsembleGlyBACKGROUND: Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. RESULTS: We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction. CONCLUSION: Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.http://turing.cs.iastate.edu/EnsembleGly/17996106
2GlycoMineGlycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes such as cellular communication, ligand recognition, and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated. We present a novel bioinformatics tool called GlycoMine, which is a comprehensive tool for the systematic in silico identification of C-, N- and O-linked glycosylation sites in the human proteome. GlycoMine was developed using the random forest algorithm and evaluated based on a well-prepared up-to-date benchmark dataset that encompasses all three types of glycosylation sites, which was curated from multiple public resources.http://www.structbioinfor.org/Lab/GlycoMine/25568279
3GlycoPPGlycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes such as cellular communication, ligand recognition, and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated. We present a novel bioinformatics tool called GlycoMine, which is a comprehensive tool for the systematic in silico identification of C-, N- and O-linked glycosylation sites in the human proteome. GlycoMine was developed using the random forest algorithm and evaluated based on a well-prepared up-to-date benchmark dataset that encompasses all three types of glycosylation sites, which was curated from multiple public resources.http://www.imtech.res.in/raghava/glycopp/22808107
4GPPAb Initio Calculations of the Electronic Excited States of Molecules, Electronic Structure and Circular Dichroism of Proteins, Protein Folding and Evolution, Bioinformatics, Computer-Aided Drug Design, Drug Resistance. Please follow the links to publications on the respective topic.http://comp.chem.nottingham.ac.uk/glyco/19038042
5GS-alignGlycans play critical roles in many biological processes, and their structural diversity is key for specific protein-glycan recognition. GS-align is a novel computational method for glycan structure alignment and similarity measurement. GS-align generates possible alignments between two glycan structures through iterative maximum clique search and fragment superposition, and the optimal alignment is determined by the maximum structural similarity score, GS-score whose significance is size-independent.http://www.glycanstructure.org/gsalign25857669
6GlycoEPGlycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites’ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep?/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.http://www.imtech.res.in/raghava/glycoep23840574
7iGlycoS-PseAACGlycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8%. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2%. The overall accuracy achieved through Jackknife test is 96.195% by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools.31985438
#Tool NameDescriptionURLReference
1big-Pi plantPosttranslational glycosylphosphatidylinositol (GPI) lipid anchoring is common not only for animal and fungal but also for plant proteins. The attachment of the GPI moiety to the carboxyl-terminus after proteolytic cleavage of a C-terminal propeptide is performed by the transamidase complex. Its four known subunits also have obvious full-length orthologs in the Arabidopsis and rice (Oryza sativa) genomes; thus, the mechanism of substrate protein processing appears similar for all eukaryotes. A learning set of plant proteins (substrates for the transamidase complex) has been collected both from the literature and plant sequence databases. We find that the plant GPI lipid anchor motif differs in minor aspects from the animal signal (e.g. the plant hydrophobic tail region can contain a higher fraction of aromatic residues). We have developed the "big-Pi plant" program for prediction of compatibility of query protein C-termini with the plant GPI lipid anchor motif requirements. Validation tests show that the sensitivity for transamidase targets is approximately 94%, and the rate of false positive prediction is about 0.1%. Thus, the big-Pi predictor can be applied as unsupervised genome annotation and target selection tool. The program is also suited for the design of modified protein constructs to test their GPI lipid anchoring capacity. The big-Pi plant predictor Web server and lists of potential plant precursor proteins in Swiss-Prot, SPTrEMBL, Arabidopsis, and rice proteomes are available at http://mendel.imp.univie.ac.at/gpi/plants/gpi_plants.html. Arabidopsis and rice protein hits have been functionally classified. Several GPI lipid-anchored arabinogalactan-related proteins have been identified in rice.http://mendel.imp.ac.at/gpi/plant_server.html14681532
2FragAnchorA glycosylphosphatidylinositol (GPI) anchor is a common but complex C-terminal post-translational modification of extracellular proteins in eukaryotes. Here we investigate the problem of correctly annotating GPI-anchored proteins for the growing number of sequences in public databases. We developed a computational system, called FragAnchor, based on the tandem use of a neural network (NN) and a hidden Markov model (HMM). Firstly, NN selects potential GPI-anchored proteins in a dataset, then HMM parses these potential GPI signals and refines the prediction by qualitative scoring. FragAnchor correctly predicted 91% of all the GPI-anchored proteins annotated in the Swiss-Prot database. In a large-scale analysis of 29 eukaryote proteomes, FragAnchor predicted that the percentage of highly probable GPI-anchored proteins is between 0.21% and 2.01%. The distinctive feature of FragAnchor, compared with other systems, is that it targets only the C-terminus of a protein, making it less sensitive to the background noise found in databases and possible incomplete protein sequences. Moreover, FragAnchor can be used to predict GPI-anchored proteins in all eukaryotes. Finally, by using qualitative scoring, the predictions combine both sensitivity and information content. The predictor is publicly available at [see text].http://navet.ics.hawaii.edu/~fraganchor/NNHMM/NNHMM.html17893077
3GPI-SOMMOTIVATION: Anchoring of proteins to the extracytosolic leaflet of membranes via C-terminal attachment of glycosylphosphatidylinositol (GPI) is ubiquitous and essential in eukaryotes. The signal for GPI-anchoring is confined to the C-terminus of the target protein. In order to identify anchoring signals in silico, we have trained neural networks on known GPI-anchored proteins, systematically optimizing input parameters. RESULTS: A Kohonen self-organizing map, GPI-SOM, was developed that predicts GPI-anchored proteins with high accuracy. In combination with SignalP, GPI-SOM was used in genome-wide surveys for GPI-anchored proteins in diverse eukaryotes. Apart from specialized parasites, a general trend towards higher percentages of GPI-anchored proteins in larger proteomes was observed. AVAILABILITY: GPI-SOM is accessible on-line at http://gpi.unibe.ch. The source code (written in C) is available on the same website. SUPPLEMENTARY INFORMATION: Positive training set, performance test sets and lists of predicted GPI-anchored proteins from different eukaryotes in fasta format.http://gpi.unibe.ch/15691858
4PredGPIPredGPI is a prediction system for GPI-anchored proteins. It is based on a support vector machine (SVM) for the discrimination of the anchoring signal, and on a Hidden Markov Model (HMM) for the prediction of the most probable omega-sitehttp://gpcr.biocomp.unibo.it/predgpi/index.htm18811934
#Tool NameDescriptionURLReference
1HydLocAs a kind of post-translational modifications, hydroxylation drew less attention than other modifications, such as phosphorylation and acetylation. However, besides protein stability regulation, it has been found that hydroxylation may affect the activity of proteins. Therefore, it is necessary to better understand the biological processes of hydroxylation. Identification of hydroxylated substrates and their corresponding sites is important for the studies of its molecular mechanism. Fast and convenient computational methods for hydroxylation sites identification are much desired, because experimental approaches are time-consuming and labor-intensive. Here, we present HydLoc (Hydroxylation sites Location), a random forest-based hydroxylation sites predictor for human proteins using sequential information and physicochemical properties. The accuracies of leave-one-out cross-validation on the training dataset are 84.25% and 80.61% for residue proline (P) and lysine (K), respectively. Based on the independent test dataset, it achieved an accuracy of 90.74% and 81.25% for P and K hydroxylation sites prediction, respectively. Meanwhile, the sensitivity values of 96.29% and 75.00% were obtained for residue P and K, which outperforms the existing methods. A user-friendly web server of HydLoc is now available at https://www.gdpu-bioinfolab.com/hydloc/https://www.gdpu-bioinfolab.com/hydloc/10.1016/j.chemolab.2020.104035
2iHyd-LysSite (EPSV)Introduction: Hydroxylation is one of the most important post-translational modifications (PTM) in cellular functions and is linked to various diseases. The addition of one of the hydroxyl groups (OH) to the lysine sites produces hydroxylysine when undergoes chemical modification. Methods: The method which is used in this study for identifying hydroxylysine sites based on powerful mathematical and statistical methodology incorporating the sequence-order effect and composition of each object within protein sequences. This predictor is called "iHyd-LysSite (EPSV)" (identifying hydroxylysine sites by extracting enhanced position and sequence variant technique). The prediction of hydroxylysine sites by experimental methods is difficult, laborious and highly expensive. In silico technique is an alternative approach to identify hydroxylysine sites in proteins. Results: The experimental results require that the predictive model should have high sensitivity and specificity values and must be more accurate. The self-consistency, independent, 10-fold cross-validation and jackknife tests are performed for validation purposes. These tests are resulted by using three renowned classifiers, Neural Networks (NN), Random Forest (RF) and Support Vector Machine (SVM) with the demanding prediction rate. The overall predictive outcomes are extraordinarily superior to the results obtained by previous predictors. The proposed model contributed an excellent prediction rate in the system for NN, RF, and SVM classifiers. The sensitivity and specificity results using all these classifiers for jackknife test are 96.08%, 94.99%, 98.16% and 97.52%, 98.52%, 80.95%. Conclusion: The results obtained by the proposed tool show that this method may meet the future demand of hydroxylysine sites with a better prediction rate over the existing methods.33214770
3iHyd-PseCpProtein hydroxylation is a posttranslational modification (PTM), in which a CH group in Pro (P) or Lys (K) residue has been converted into a COH group, or a hydroxyl group (-OH) is converted into an organic compound. Closely associated with cellular signaling activities, this type of PTM is also involved in some major diseases, such as stomach cancer and lung cancer. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of P or K, which ones can be hydroxylated, and which ones cannot? With the explosive growth of protein sequences in the post-genomic age, the problem has become even more urgent. To address such a problem, we have developed a predictor called iHyd-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition (PseAAC) and introducing the "Random Forest" algorithm to operate the calculation. Rigorous jackknife tests indicated that the new predictor remarkably outperformed the existing state-of-the-art prediction method for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for iHyd-PseCp has been established at http://www.jci-bioinfo.cn/iHyd-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.http://www.jci-bioinfo.cn/iHyd-PseCp27322424
4iHyd-PseAACPost-translational modifications (PTMs) play crucial roles in various cell functions and biological processes. Protein hydroxylation is one type of PTM that usually occurs at the sites of proline and lysine. Given an uncharacterized protein sequence, which site of its Pro (or Lys) can be hydroxylated and which site cannot? This is a challenging problem, not only for in-depth understanding of the hydroxylation mechanism, but also for drug development, because protein hydroxylation is closely relevant to major diseases, such as stomach and lung cancers. With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop computational methods to address this problem. In view of this, a new predictor called "iHyd-PseAAC" (identify hydroxylation by pseudo amino acid composition) was proposed by incorporating the dipeptide position-specific propensity into the general form of pseudo amino acid composition. It was demonstrated by rigorous cross-validation tests on stringent benchmark datasets that the new predictor is quite promising and may become a useful high throughput tool in this area. A user-friendly web-server for iHyd-PseAAC is accessible at http://app.aporc.org/iHyd-PseAAC/. Furthermore, for the convenience of the majority of experimental scientists, a step-by-step guide on how to use the web-server is given. Users can easily obtain their desired results by following these steps without the need of understanding the complicated mathematical equations presented in this paper just for its integrity.http://app.aporc.org/iHyd-PseAAC/24857907
5iHyd-PseAAC (EPSV)Background: In various biological processes and cell functions, Post Translational Modifications (PTMs) bear critical significance. Hydroxylation of proline residue is one kind of PTM, which occurs following protein synthesis. The experimental determination of hydroxyproline sites in an uncharacterized protein sequence requires extensive, time-consuming and expensive tests. Methods: With the torrential slide of protein sequences produced in the post-genomic age, certain remarkable computational strategies are desired to overwhelm the issue. Keeping in view the composition and sequence order effect within polypeptide chains, an innovative in-silico> predictor via a mathematical model is proposed. Results: Later, it was stringently verified using self-consistency, cross-validation and jackknife tests on benchmark datasets. It was established after a rigorous jackknife test that the new predictor values are superior to the values predicted by previous methodologies. Conclusion: This new mathematical technique is the most appropriate and encouraging as compared with the existing models.31555063
#Tool NameDescriptionURLReference
1LipoPredLipoylation is a highly conserved post-translational modification which has been found to be involved in many biological processes and closely associated with various metabolic diseases. The accurate identification of lipoylation sites is necessary to elucidate the underlying molecular mechanisms of lipoylation. As the traditional experimental methods are time consuming and expensive, it is desired to develop computational methods to predict lipoylation sites. In this study, a novel predictor named LipoPred is proposed to predict lysine lipoylation sites. On the one hand, an effective feature extraction method, bi-profile bayes encoding, is employed to encode lipoylation sites. On the other hand, a fuzzy support vector machine algorithm is proposed to solve the class imbalance and noise problem in the prediction of lipoylation sites. As illustrated by 10-fold cross-validation, LipoPred achieves an excellent performance with a Matthew's correlation coefficient of 0.9930. Therefore, LipoPred can be a useful bioinformatics tool for the prediction of lipoylation sites. Feature analysis shows that some residues around lipoylation sites may play an important role in the prediction. The results of analysis and prediction could offer useful information for elucidating the molecular mechanisms of lipoylation. A user-friendly web-server for LipoPred is established at 123.206.31.171/LipoPred/.123.206.31.171/LipoPred/30218638
2LipoSVMBackground: Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. Methodology: In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. Results: By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. Conclusion: A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.https://github.com/stars20180811/LipoSVM32476993
#Tool NameDescriptionURLReference
1LysAcetReversible acetylation on lysine residues, a crucial post-translational modification (PTM) for both histone and non-histone proteins, governs many central cellular processes. Due to limited data and lack of a clear acetylation consensus sequence, little research has focused on prediction of lysine acetylation sites. Incorporating almost all currently available lysine acetylation information, and using the support vector machine (SVM) method along with coding schema for protein sequence coupling patterns, we propose here a novel lysine acetylation prediction algorithm: LysAcet. When compared with other methods or existing tools, LysAcet is the best predictor of lysine acetylation, with K-fold (5- and 10-) and jackknife cross-validation accuracies of 75.89%, 76.73%, and 77.16%, respectively. LysAcet's superior predictive accuracy is attributed primarily to the use of sequence coupling patterns, which describe the relative position of two amino acids. LysAcet contributes to the limited PTM prediction research on lysine epsilon-acetylation, and may serve as a complementary in-silicon approach for exploring acetylation on proteomes. An online web server is freely available at http://www.biosino.org/LysAcet/.http://www.biosino.org/LysAcet/19689425
2ASEBProtein lysine acetylation plays an important role in the normal functioning of cells, including gene expression regulation, protein stability and metabolism regulation. Although large amounts of lysine acetylation sites have been identified via large-scale mass spectrometry or traditional experimental methods, the lysine (K)-acetyl-transferase (KAT) responsible for the acetylation of a given protein or lysine site remains largely unknown due to the experimental limitations of KAT substrate identification. Hence, the in silico prediction of KAT-specific acetylation sites may provide direction for further experiments. In our previous study, we developed the acetylation set enrichment based (ASEB) computer program to predict which KAT-families are responsible for the acetylation of a given protein or lysine site. In this article, we provide KAT-specific acetylation site prediction as a web service. This web server not only provides the online tool and R package for the method in our previous study, but several useful services are also included, such as the integration of protein-protein interaction information to enhance prediction accuracy. This web server can be freely accessed at http://cmbi.bjmu.edu.cn/huac.http://cmbi.bjmu.edu.cn/huac22600735
3BRABSB-PHKABRABSB-PHKA is an in silico online tool for Prediction of potential Human Lysine(K) Acetylation(PHKA) sites from protein sequences. The computational methodology is based on Bi-Relative Binomial Score Bayes (BRBSB) combined with support vector machines (SVMs). BRBSB-PHKA yields, on average, a sensitivity of 83.91%, a specificity of 87.25% and an accuracy of 85.58% in the case of 5-fold cross validation, together with the results on independent test data sets, suggesting that BRBSB-PHKA presented here can facilitate the identification of human lysine acetylation sites and more confident annotation. BRBSB-PHKA supports two input forms for query sequence(s), directly PASTE a single sequence or several sequences in FASTA format into the input frame or UPLOAD a file in FASTA format from local disk (protein sequences here are all represented in single-letter code amino acids). The sequence part allows any character, figure or space except ">". The prediction results of BRBSB-PHKA are shown in output table. Sequence name---denotes the name of each query sequence in FASTA format. If no names are provided for query sequences in FASTA format, the system will give them names of "default sequence 1", "default sequence2", ... Position---stands for the absolute position of potentially acetyllysine sites in proteins. Acetylated residue ---refers to corresponding acetylated amino acid. Score ---refers to the predictive probability of acetylation at the corresponding site. Flanking residues---represents the flanking sequence centering on acetylated residue (the length is 15 for BRABSB-PHKA)http://www.bioinfo.bio.cuhk.edu.hk/bpbphka22936054
4PAILProtein acetylation is a widespread covalent modification in eukaryotes, transferring acetyl groups from acetyl coenzyme A (acetyl CoA) to either a-amino group of amino-terminal residues or to the e-amino group of internal lysines at specific sites (Glozak,MA et al., 2005;Kouzarides,T, 2000; Polevoda,B et al., 2000; Polevoda,B et al., 2002; Yang,XJ, 2004). As one of the most ubiquitous protein modifications, approximately 85% of eukaryotic proteins are Nepsilon-terminal acetylated in a co-translational manner on several types of residues such as Serine, Alainine, and so on (Polevoda,B et al., 2000; Polevoda,B et al., 2002). And Nepsilon-lysine acetylation is less common, but probably more important. Nepsilon-acetylation of proteins in internal lysine residues is an essential and highly reversible type of post-translational modification (PTM), and orchestrates a variety of cellular processes, including transcription regulation (Faiola,F et al., 2005; Brunet,A et al., 2004), DNA repair (Murr,R et al., 2006), apoptosis (Subramanian,C et al., 2005; Cohen,HY et al., 2004), cytokine signaling (Yuan,ZL et al., 2005), and nuclear import (Bannister,AJ et al.,2000), etc. As a 'loss-of-function' mechanism proposed, Nepsilon-acetylation greatly alters the electrostatic properties of a protein by neutralizing the positive charge of the lysine residues. And formation of hydrogen bonds on lysine side-chains are also disrupted (Yang,XJ, 2004; Yang,XJ,2004b). In addition, lysine acetylation also creates a new interface for protein binding, as a 'gain-of-function' mechanism (Yang,XJ, 2004; Yang,XJ,2004b). Thus, Nepsilon-acetylation may modulate the protein function, such as of protein-protein interaction, DNA binding, enzymatic activity, stability and subcellular localization (Glozak,MA et al., 2005; Polevoda,B et al., 2002; Yang,XJ, 2004; Faiola,F et al., 2005; Brunet,A et al., 2004; Yuan,ZL et al., 2005; Bannister,AJ et al.,2000; Yang,XJ,2004b).http://bdmpail.biocuckoo.org/prediction.php17045240
5LAcePhttp://www.scbit.org/iPTM/24586884
6PSKAcePred In this work, we present a novel online predictor for protein acetylation sites prediction of PAIL, Prediction of Acetylation on Internal Lysines. We have manually mined scientific literature to collect 249 experimentally verified acetylation sites of 92 distinct proteins. Then the BDM (Bayesian Discriminant Method) algorithm has been employed. The window length of a potential acetylated peptide has been optimized as 13. The accuracy of PAIL is highly encouraging with 85.13%, 87.97% and 89.21% at low, medium and high thresholds, respectively. Both Jack-knife validation and n-fold (6-, 8-, and 10-fold) cross-validation have been performed to show that the PAIL is accurate and robust. In this regard, we propose that PAIL could be a useful tool for experimentalists. And the prediction results of PAIL might also be insightful for further experimental design. For convenience, we have implemented the prediction system in a web server, which is available at: http://bdmpail.biocuckoo.org/.http://bioinfo.ncu.edu.cn/inquiries_PSKAcePred.aspx23173045
#Tool NameDescriptionURLReference
1iMethylK_pseAACBackground: Methylation is one of the most important post-translational modifications in the human body which usually arises on lysine among the most intensely modified residues. It performs a dynamic role in numerous biological procedures, such as regulation of gene expression, regulation of protein function and RNA processing. Therefore, to identify lysine methylation sites is an important challenge as some experimental procedures are time-consuming. Objective: Herein, we propose a computational predictor named iMethylK_pseAAC to identify lysine methylation sites. Methods: Firstly, we constructed feature vectors based on PseAAC using position and composition rel-ative features and statistical moments. A neural network is trained based on the extracted features. The performance of the proposed method is then validated using cross-validation and jackknife testing. Results: The objective evaluation of the predictor showed accuracy of 96.7% for self-consistency, 91.61% for 10-fold cross-validation and 93.42% for jackknife testing. Conclusion: It is concluded that iMethylK_pseAAC outperforms the counterparts to identify lysine methylation sites such as iMethyl_pseACC, BPB_pPMS and PMeS.https://github.com/umtwaqar/iMethylK_pseAAC32030087
#Tool NameDescriptionURLReference
1Mal-LightPost Translational Modification (PTM) is considered an important biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. During the past decades, a wide range of PTMs has been identified. Among them, malonylation is a recently identified PTM which plays a vital role in a wide range of biological interactions. Notwithstanding, this modification plays a potential role in energy metabolism in different species including Homo Sapiens. The identification of PTM sites using experimental methods is time-consuming and costly. Hence, there is a demand for introducing fast and cost-effective computational methods. In this study, we propose a new machine learning method, called Mal-Light, to address this problem. To build this model, we extract local evolutionary-based information according to the interaction of neighboring amino acids using a bi-peptide based method. We then use Light Gradient Boosting (LightGBM) as our classifier to predict malonylation sites. Our results demonstrate that Mal-Light is able to significantly improve malonylation site prediction performance compared to previous studies found in the literature. Using Mal-Light we achieve Matthew's correlation coefficient (MCC) of 0.74 and 0.60, Accuracy of 86.66% and 79.51%, Sensitivity of 78.26% and 67.27%, and Specificity of 95.05% and 91.75%, for Homo Sapiens and Mus Musculus proteins, respectively. Mal-Light is implemented as an online predictor which is publicly available at: (http://brl.uiu.ac.bd/MalLight/).http://brl.uiu.ac.bd/MalLight/33354488
2kmal-spAs a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.http://kmalsp.erc.monash.edu/30351377
3SEMalPost Transactional Modification (PTM) is a vital process which plays an important role in a wide range of biological interactions. One of the most recently identified PTMs is Malonylation. It has been shown that Malonylation has an important impact on different biological pathways including glucose and fatty acid metabolism. Malonylation can be detected experimentally using mass spectrometry. However, this process is both costly and time-consuming which has inspired research to find more efficient and fast computational methods to solve this problem. This paper proposes a novel approach, called SEMal, to identify Malonylation sites in protein sequences. It uses both structural and evolutionary-based features to solve this problem. It also uses Rotation Forest (RoF) as its classification technique to predict Malonylation sites. To the best of our knowledge, our extracted features as well as our employed classifier have never been used for this problem. Compared to the previously proposed methods, SEMal outperforms them in all metrics such as sensitivity (0.94 and 0.89), accuracy (0.94 and 0.91), and Matthews correlation coefficient (0.88 and 0.82), for Homo Sapiens and Mus Musculus species, respectively. SEMal is publicly available as an online predictor at: http://brl.uiu.ac.bd/SEMal/.http://brl.uiu.ac.bd/SEMal/33022522
4LEMP As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.http://www.bioinfogo.org/lemp.30639696
5Mal-PrecBackground: Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs. Results: In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively. Conclusion: Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec , together with the data sets used in this study.https://github.com/flyinsky6/Mal-Prec33225896
6RF-MaloSite and DL-MalositeMalonylation, which has recently emerged as an important lysine modification, regulates diverse biological activities and has been implicated in several pervasive disorders, including cardiovascular disease and cancer. However, conventional global proteomics analysis using tandem mass spectrometry can be time-consuming, expensive and technically challenging. Therefore, to complement and extend existing experimental methods for malonylation site identification, we developed two novel computational methods for malonylation site prediction based on random forest and deep learning machine learning algorithms, RF-MaloSite and DL-MaloSite, respectively. DL-MaloSite requires the primary amino acid sequence as an input and RF-MaloSite utilizes a diverse set of biochemical, physiochemical and sequence-based features. While systematic assessment of performance metrics suggests that both 'RF-MaloSite' and 'DL-MaloSite' perform well in all metrics tested, our methods perform particularly well in the areas of accuracy, sensitivity and overall method performance (assessed by the Matthew's Correlation Coefficient). For instance, RF-MaloSite exhibited MCC scores of 0.42 and 0.40 using 10-fold cross-validation and an independent test set, respectively. Meanwhile, DL-MaloSite was characterized by MCC scores of 0.51 and 0.49 based on 10-fold cross-validation and an independent set, respectively. Importantly, both methods exhibited efficiency scores that were on par or better than those achieved by existing malonylation site prediction methods. The identification of these sites may also provide important insights into the mechanisms of crosstalk between malonylation and other lysine modifications, such as acetylation, glutarylation and succinylation. To facilitate their use, both methods have been made freely available to the research community at https://github.com/dukkakc/DL-MaloSite-and-RF-MaloSite.https://github.com/dukkakc/DL-MaloSite-and-RF-MaloSite32322367
7Kmalo Protein malonylation, a reversible post-translational modification of lysine residues, is associated with various biological functions, such as cellular regulation and pathogenesis. In proteomics, to improve our understanding of the mechanisms of malonylation at the molecular level, the identification of malonylation sites via an efficient methodology is essential. However, experimental identification of malonylated substrates via mass spectrometry is time-consuming, labor-intensive, and expensive. Although numerous methods have been developed to predict malonylation sites in mammalian proteins, the computational resource for identifying plant malonylation sites is very limited. In this study, a hybrid model incorporating multiple convolutional neural networks (CNNs) with physicochemical properties, evolutionary information, and sequenced-based features was developed for identifying protein malonylation sites in mammals. For plant malonylation, multiple CNNs and random forests were integrated into a secondary modeling phase using a support vector machine. The independent testing has demonstrated that the mammalian and plant malonylation models can yield the area under the receiver operating characteristic curves (AUC) at 0.943 and 0.772, respectively. The proposed scheme has been implemented as a web-based tool, Kmalo (https://fdblab.csie.ncu.edu.tw/kmalo/home.html), which can help facilitate the functional investigation of protein malonylation on mammals and plants.https://fdblab.csie.ncu.edu.tw/kmalo/home.html32601280
8MaloPredMotivation: Protein malonylation is a novel post-translational modification (PTM) which orchestrates a variety of biological processes. Annotation of malonylation in proteomics is the first-crucial step to decipher its physiological roles which are implicated in the pathological processes. Comparing with the expensive and laborious experimental research, computational prediction can provide an accurate and effective approach to the identification of many types of PTMs sites. However, there is still no online predictor for lysine malonylation. Results: By searching from literature and database, a well-prepared up-to-data benchmark datasets were collected in multiple organisms. Data analyses demonstrated that different organisms were preferentially involved in different biological processes and pathways. Meanwhile, unique sequence preferences were observed for each organism. Thus, a novel malonylation site online prediction tool, called MaloPred, which can predict malonylation for three species, was developed by integrating various informative features and via an enhanced feature strategy. On the independent test datasets, AUC (area under the receiver operating characteristic curves) scores are obtained as 0.755, 0.827 and 0.871 for Escherichia coli ( E.coli ), Mus musculus ( M.musculus ) and Homo sapiens ( H.sapiens ), respectively. The satisfying results suggest that MaloPred can provide more instructive guidance for further experimental investigation of protein malonylation.http://bioinfo.ncu.edu.cn/MaloPred.aspx28025199
#Tool NameDescriptionURLReference
1CarSPredIntroduction: 1.The software CarSPred could be used to identify carbonylation sites of query human protein sequences. 2.The software consists of four modules which are devoted to K, R, T and P carbonylation site prediction separately. 3.It receives protein sequences or file in FASTA format as input. 4.For output result, list and file are optional and the annotations will clearly indicate the precise location and probability of putative carbonylation site in the sequence. 5.The software can also be used to predict carbonylation sites of other mammal proteins to a certain extent due to their close homology with human proteins. 6.The software is in 'CarSPred' folder. Datasets of carbonylated protein and sample sequences of carbonylation site are in 'Datasets' folder.http://sourceforge.net/projects/hqlstudio/files/CarSPred-1.0/25347395
2ISSPredIn the modern era, process of protein expression is further complexed by the addition of new Post-translational Modification events such as proteolytic cleavage of polyproteins, proteome mediated peptide ligation, non-ribosomal addition of moieties and intein mediated protein splicing. Protein splicing is a recently discovered Post-translational Modification in which one internal fragment, termed intein (Protein introns), is excised from a precursor protein and the flanking regions, termed extein, ligate to form a mature protein. The process of precise intein splicing and formation of specific peptide bonds has been tempting scholars to develop many novel applications. This server is an attempt to help biolgist identify Inteins hiding in their protein sequences.http://www.imtech.res.in/raghava/isspred/
3ModPredModPred is a sequence-based predictor of potential post-translational modification (PTM) sites in proteins. It consists of 34 ensembles of logistic regression models, trained separately on a combined set of 126,036 non-redundant experimentally verified sites for 23 different modifications, obtained from public databases and an ad-hoc literature search. Areas under the ROC curve (AUCs) were estimated to range from ~60 to 97%, depending on the type of PTM.http://www.modpred.org/24888500
4Motifs treeMOTIVATION: Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N(a)-terminal acetylation (N-Ac), two of the most common PTMs. RESULTS: The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models.http://terminus.unige.ch/24681905
5NetChopNetChop 3.1 Server The NetChop server produces neural network predictions for cleavage sites of the human proteasome. NetChop has been trained on human data only, and will therefore presumably have better performance for prediction of the cleavage sites of the human proteasome. However, since the proteasome structure is quite conserved, we believe that the server is able to produce reliable predictions for at least the other mammalian proteasomes. This server is an update to the Netchop 2.0 server. It has been trained using a novel sequence encoding scheme, and an improved neural network training strategy. The Netchop 3.0 version has two different network methods that can be used for prediction. C-term 3.0 and 20S 3.0. View the version history of this server. All the previous versions are available on line, for comparison and reference. C-term 3.0 network is trained with a database consisting of 1260 publicly available MHC class I ligands (using only C-terminal cleavage site of the ligands). 20S network is trained with in vitro degradation data published in Toes, et al. and Emmerich et al. C-term 3.0 network performs best in predicting the boundaries of CTL epitopes. Another proteasome prediction server is available in Tubingen University: PAProc http://www.cbs.dtu.dk/services/NetChop/11983929
6NetCoronaThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetCorona/15180906
7NetPicoRNAThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetPicoRNA/8931139
8PAProCWhat are proteasomes ? Proteasomes are cytosolic multisubunit proteases which are involved in cell cycle control, transcription factor activation and the generation of peptide ligands for MHC I molecules (for reviews, see Baumeister et al. (1998), Rock & Goldberg (1999), Uebel & Tampe (1999)). They exist in several forms; either as proteolytically active core complexes or 20S proteasomes and, when associated with the ATP-dependent 19S cap complexes, larger 26S proteasomes that are able to recognize proteins marked by ubiquitin for proteasomal degradation (Jentsch & Schlenker, 1995; Hershko & Ciechanover, 1998). Another protein complex known to associate with the 20S core particle is PA28, the 11S regulator (Ahn et al., 1995), which was shown to improve the yield of antigenic peptides (Groettrup et al., 1996; Dick et al., 1996). Eukaryotic 20S proteasomes consist of four stacked rings (overall stoichiometry alpha7beta7beta7alpha7), each consisting of 7 different subunits (Groll et al., 1997 [See picture taken from this reference at the bottom of the page. The picture shows a section through the cylinder of yeast 20S proteasomes. The positions of the active sites are highlighted through binding of specific inhibitors (yellow).]) . Each of the two inner beta-rings carries three catalytically active sites on its inner surface. Their proteolytic specificities have been described as chymotrypsin-like (cleaving after large, hydrophobic AAs), trypsin-like (cleaving after basic AAs) and peptidyl-glutamyl-peptide-hydrolyzing (cleaving after acidic AAs) (for review, see Uebel & Tampe (1999)). Strings of unfolded proteins are thought to be inserted into the cylinder and to be cut into pieces by the active sites; the resulting peptide fragments are then released into the cytosol. Functionally, proteasomal protein degradation is believed to proceed from one substrate end to the other ("processively"), without the release of large degradation intermediates (Akopian et al., 1997; Nussbaum et al., 1998; Kisselev et al., 1999). Why is proteasomal cleavage specificity important for immune responses? In vertebrate cells, some of the proteolytic fragments produced by proteasomes are fed into the antigen processing machinery (see picture ). Since peptide presentation by MHC I molecules at the cell surface is an intrinsic requirement for the ability of the immune system to eradicate virus-infected or transformed cells (Rammensee et al., 1993; Pamer & Cresswell, 1998), it is of general interest to know exactly how the proteasome is involved in this process. Proteasomal cleavage specificity has been assessed by in vitro digestion experiments using either tri- or tetrapeptides with fluorogenic leaving groups (Kuckelkorn et al., 1995; Heinemeyer et al., 1997; Arendt & Hochstrasser, 1997), peptides of 15-40 AAs (Boes et al., 1994; Niedermann et al., 1995; Niedermann et al., 1996; Dick et al., 1998), or denatured proteins (Dick et al., 1991; Dick et al., 1994; Kisselev et al., 1998, Kisselev et al., 1999) as substrates. We analyzed the cleavage preferences of yeast wild-type and mutant proteasomes in a non-modified protein (Nussbaum et al., 1998). Using statistical analysis of cut sites, it was possible for the first time to determine so-called cleavage motifs, i.e. the preferred sequences around cleavage sites, for the three active beta-subunits of yeast proteasomes. Why would a prediction tool be beneficial? In order to apply experimentally determined information on cleavage site selection by proteasomes to any possible proteasome substrate, one needs an automated prediction device. Such devices already exist for the binding of peptides to MHC I molecules (Database SYFPEITHI , Rammensee et al., 1997) and have been described for peptide transport by the transporter associated with antigen processing (TAP) (Daniel et al., 1998). However, devices for the prediction of proteasomal cleavages are only at the beginning of their development. A proteasomal cleavage prediction tool could, especially in combination with MHC ligand predictors as SYFPEITHI, help to improve the forecast of MHC class I restricted CTL-responses. More specifically, it could support researchers in their quest for individual CTL-epitopes by limiting the number of possible MHC class I ligands from protein antigens. In addition, the effect of amino acid mutations in viral or tumor-specific proteins on antigen presentation could be assessed. Thus, proteasomal cleavage prediction would lend a hand in rational vaccine design. PAProC We have made the first step towards this end by providing PAProC (Prediction Algorithm for Proteasomal Cleavages), a public prediction tool for proteasomal cleavages. PAProC offers information on both the general cleavability of amino acid sequences (cuts per amino acids) and individual cleavages (positions and estimated strength; for details, please refer to the user information). PAProC was developed from the beginning, i.e. from the experimental basis to the ready-to-use public prediction tool, by proteasome experts at the Department of Immunology in close collaboration with programmers at the Department of Biomathematics, both at the University of Tübingen, Germany. We are therefore confident that PAProC has profited from the best possible expertise. However, we are aware of the fact that PAProC is still in its teething stage. For example, cleavage sites and estimated cleavage strength are not yet based on quantified cleavage data (in PAProC I). Therefore, we are continuously working to improve PAProC. However, we need your help: The program will profit from your experience with it. So please let us know how PAProC performed for you. Thank you for your collaboration. http://paproc.de/11345595
9PcleavageAntigen processing and presentation are processes that occur within a cell that result in fragmentation (proteolysis) of proteins, association of the fragments with MHC molecules, and expression of the peptide-MHC molecules at the cell surface where they can be recognized by the T cell receptor on a T cell. This lead to the stimulation of CTL cells to clear the infection.The three major step where we can devise rules Degradation of antigens by proteasomes. Transport of peptides fragments through TAP transporter Binding of transported peptides MHC molecules.http://www.imtech.res.in/raghava/pcleavage/index.html15988831
10PEIMANPEIMAN (Posttranslational modification Enrichment, Integration and Matching ANalysis) is a standalone software and platform free for enrichment analysis in post translational modification (PTM) types. The software also provides the comparison between two different lists of proteins, focusing on PTM types. Investigating the PTM frequency in each list is also available. http://bs.ipm.ir/softwares/PEIMAN/25911152
11PeptideMapPeptideMap marks a peptide sequence at every position where a known proteolytic enzyme or reagent might cut it. You can select one or a few enzymes or let PeptideMap use the whole list. PeptideMap is simply the program Map run with -PROGRAMname=PeptideMap. (See the documentation for Map in the Program Manual for a complete description.)http://prowl.rockefeller.edu/prowl/peptidemap.html
12PHOXTRACKPHOXTRACK (PHOsphosite-X-TRacing Analysis of Causal Kinases) is a computational tool to compare kinase activities between different phosphoproteomes to identify key regulating proteins. In its current version, PHOXTRACK maps quantified phosphopeptides to their putative kinases and tests for concordant changes of kinase activity comparing whole phosphoproteomes. For this purpose, PHOXTRACK searches for an enrichment of known kinase targets in the uploaded phosphoproteomics profile data. PHOXTRACK thus allows for identification of regulated kinase activities between experimental conditions.http://phoxtrack.molgen.mpg.de/25152232
13ProPThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/ProP/14985543
14PTM-XPost-translational modification (PTM)(1) plays an important role in regulating the functions of proteins. PTMs of multiple residues on one protein may work together to determine a functional outcome, which is known as PTM cross-talk. Identification of PTM cross-talks is an emerging theme in proteomics and has elicited great interest, but their properties remain to be systematically characterized. To this end, we collected 193 PTM cross-talk pairs in 77 human proteins from the literature and then tested location preference and co-evolution at the residue and modification levels. We found that cross-talk events preferentially occurred among nearby PTM sites, especially in disordered protein regions, and cross-talk pairs tended to co-evolve. Given the properties of PTM cross-talk pairs, a naive Bayes classifier integrating different features was built to predict cross-talks for pairwise combination of PTM sites. By using a 10-fold cross-validation, the integrated prediction model showed an area under the receiver operating characteristic (ROC) curve of 0.833, superior to using any individual feature alone. The prediction performance was also demonstrated to be robust to the biases in the collected PTM cross-talk pairs. The integrated approach has the potential for large-scale prioritization of PTM cross-talk candidates for functional validation and was implemented as a web server available at http://bioinfo.bjmu.edu.cn/ptm-x/.http://bioinfo.bjmu.edu.cn/ptm-x/25605461
15PyTMsBACKGROUND: Post-translational modifications (PTMs) constitute a major aspect of protein biology, particularly signaling events. Conversely, several different pathophysiological PTMs are hallmarks of oxidative imbalance or inflammatory states and are strongly associated with pathogenesis of autoimmune diseases or cancers. Accordingly, it is of interest to assess both the biological and structural effects of modification. For the latter, computer-based modeling offers an attractive option. We thus identified the need for easily applicable modeling options for PTMs. RESULTS: We developed PyTMs, a plugin implemented with the commonly used visualization software PyMOL. PyTMs enables users to introduce a set of common PTMs into protein/peptide models and can be used to address research questions related to PTMs. Ten types of modification are currently supported, including acetylation, carbamylation, citrullination, cysteine oxidation, malondialdehyde adducts, methionine oxidation, methylation, nitration, proline hydroxylation and phosphorylation. Furthermore, advanced settings integrate the pre-selection of surface-exposed atoms, define stereochemical alternatives and allow for basic structure optimization of the newly modified residues. CONCLUSION: PyTMs is a useful, user-friendly modelling plugin for PyMOL. Advantages of PyTMs include standardized generation of PTMs, rapid time-to-result and facilitated user control. Although modeling cannot substitute for conventional structure determination it constitutes a convenient tool that allows uncomplicated exploration of potential implications prior to experimental investments and basic explanation of experimental data. PyTMs is freely available as part of the PyMOL script repository project on GitHub and will further evolve. Graphical Abstract PyTMs is a useful PyMOL plugin for modeling common post-translational modifications.http://www.pymolwiki.org/index.php/Pytms25431162
16MusiteDeepMusiteDeep is an online resource providing a deep-learning framework for protein post-translational modification (PTM) site prediction and visualization. The predictor only uses protein sequences as input and no complex features are needed, which results in a real-time prediction for a large number of proteins. It takes less than three minutes to predict for 1000 sequences per PTM type. The output is presented at the amino acid level for the user-selected PTM types. The framework has been benchmarked and has demonstrated competitive performance in PTM site predictions by other researchers. In this webserver, we updated the previous framework by utilizing more advanced ensemble techniques, and providing prediction and visualization for multiple PTMs simultaneously for users to analyze potential PTM cross-talks directly. Besides prediction, users can interactively review the predicted PTM sites in the context of known PTM annotations and protein 3D structures through homology-based search. In addition, the server maintains a local database providing pre-processed PTM annotations from Uniport/Swiss-Prot for users to download. This database will be updated every three months. The MusiteDeep server is available at https://www.musite.net. The stand-alone tools for locally using MusiteDeep are available at https://github.com/duolinwang/MusiteDeep_web.https://www.musite.net32324217
17MUscADELLysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.http://muscadel.erc.monash.edu/30285084
18PTM-ssMPProtein post-translational modifications (PTMs) are chemical modifications of a protein after its translation. Owing to its play an important role in deep understanding of various biological processes and the development of effective drugs, PTM site prediction have become a hot topic in bioinformatics. Recently, many online tools are developed to prediction various types of PTM sites, most of which are based on local sequence and some biological information. However, few of existing tools consider the relations between different PTMs for their prediction task. Here, we develop a web server called PTM-ssMP to predict PTM site, which adopts site-specific modification profile (ssMP) to efficiently extract and encode the information of both proximal PTMs and local sequence simultaneously. In PTM-ssMP we provide efficient prediction of multiple types of PTM site including phosphorylation, lysine acetylation, ubiquitination, sumoylation, methylation, O-GalNAc, O-GlcNAc, sulfation and proteolytic cleavage. To assess the performance of PTM-ssMP, a large number of experimentally verified PTM sites are collected from several sources and used to train and test the prediction models. Our results suggest that ssMP consistently contributes to remarkable improvement of prediction performance. In addition, results of independent tests demonstrate that PTM-ssMP compares favorably with other existing tools for different PTM types. PTM-ssMP is implemented as an online web server with user-friendly interface, which is freely available at http://bioinformatics.ustc.edu.cn/PTM-ssMP/index/.http://bioinformatics.ustc.edu.cn/PTM-ssMP/index/29989096
19Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomic methods have been widely used to identify lysine acylation proteins. However, these experimental approaches often fail to detect proteins that are in low abundance or absent in specific biological samples. To circumvent these problems, we developed a computational method to predict lysine acylation, including acetylation, malonylation, succinylation, and glutarylation. The prediction algorithm integrated flanking primary sequence determinants and evolutionary conservation of acylated lysine as well as multiple protein functional annotation features including gene ontology, conserved domains, and protein-protein interactions. The inclusion of functional annotation features increases predictive power oversimple sequence considerations for four of the acylation species evaluated. For example, the Matthews correlation coefficient (MCC) for the prediction of malonylation increased from 0.26 to 0.73. The performance of prediction was validated against an independent data set for malonylation. Likewise, when tested with independent data sets, the algorithm displayed improved sensitivity and specificity over existing methods. Experimental validation by Western blot experiments and LC-MS/MS detection further attested to the performance of prediction. We then applied our algorithm on to the mouse proteome and reported the global-scale prediction of lysine acetylation, malonylation, succinylation, and glutarylation, which should serve as a valuable resource for future functional studies.27774790
20PTMscapeWhile tandem mass spectrometry can detect post-translational modifications (PTM) at the proteome scale, reported PTM sites are often incomplete and include false positives. Computational approaches can complement these datasets by additional predictions, but most available tools use prediction models pre-trained for single PTM type by the developers and it remains a difficult task to perform large-scale batch prediction for multiple PTMs with flexible user control, including the choice of training data. We developed an R package called PTMscape which predicts PTM sites across the proteome based on a unified and comprehensive set of descriptors of the physico-chemical microenvironment of modified sites, with additional downstream analysis modules to test enrichment of individual or pairs of PTMs in protein domains. PTMscape is flexible in the ability to process any major modifications, such as phosphorylation and ubiquitination, while achieving the sensitivity and specificity comparable to single-PTM methods and outperforming other multi-PTM tools. Applying this framework, we expanded proteome-wide coverage of five major PTMs affecting different residues by prediction, especially for lysine and arginine modifications. Using a combination of experimentally acquired sites (PSP) and newly predicted sites, we discovered that the crosstalk among multiple PTMs occur more frequently than by random chance in key protein domains such as histone, protein kinase, and RNA recognition motifs, spanning various biological processes such as RNA processing, DNA damage response, signal transduction, and regulation of cell cycle. These results provide a proteome-scale analysis of crosstalk among major PTMs and can be easily extended to other types of PTM.29876573
21MultiLyGANBackground: Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein's function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. Method: We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. Results: In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN . Conclusions: The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.https://github.com/Lab-Xu/MultiLyGAN33789579
22Plant PTM ViewerPost-translational modifications (PTMs) of proteins are central in any kind of cellular signaling. Modern mass spectrometry technologies enable comprehensive identification and quantification of various PTMs. Given the increased numbers and types of mapped protein modifications, a database is necessary that simultaneously integrates and compares site-specific information for different PTMs, especially in plants for which the available PTM data are poorly catalogued. Here, we present the Plant PTM Viewer (http://www.psb.ugent.be/PlantPTMViewer), an integrative PTM resource that comprises approximately 370 000 PTM sites for 19 types of protein modifications in plant proteins from five different species. The Plant PTM Viewer provides the user with a protein sequence overview in which the experimentally evidenced PTMs are highlighted together with an estimate of the confidence by which the modified peptides and, if possible, the actual modification sites were identified and with functional protein domains or active site residues. The PTM sequence search tool can query PTM combinations in specific protein sequences, whereas the PTM BLAST tool searches for modified protein sequences to detect conserved PTMs in homologous sequences. Taken together, these tools help to assume the role and potential interplay of PTMs in specific proteins or within a broader systems biology context. The Plant PTM Viewer is an open repository that allows the submission of mass spectrometry-based PTM data to remain at pace with future PTM plant studies.http://www.psb.ugent.be/PlantPTMViewer31004550
23PTM-LogoSummary: Identification of the amino-acid motifs in proteins that are targeted for post-translational modifications (PTMs) is of great importance in understanding regulatory networks. Information about targeted motifs can be derived from mass spectrometry data that identify peptides containing specific PTMs such as phosphorylation, ubiquitylation and acetylation. Comparison of input data against a standardized 'background' set allows identification of over- and under-represented amino acids surrounding the modified site. Conventionally, calculation of targeted motifs assumes a random background distribution of amino acids surrounding the modified position. However, we show that probabilities of amino acids depend on (i) the type of the modification and (ii) their positions relative to the modified site. Thus, software that identifies such over- and under-represented amino acids should make appropriate adjustments for these effects. Here we present a new program, PTM-Logo, that generates representations of these amino acid preferences ('logos') based on position-specific amino-acid probability backgrounds calculated either from user-input data or curated databases.http://sysbio.chula.ac.th/PTMLogo/ or https://hpcwebapps.cit.nih.gov/PTMLogo/31318409
24RESTful APIiPTMnet is a bioinformatics resource that integrates protein post-translational modification (PTM) data from text mining and curated databases and ontologies to aid in knowledge discovery and scientific study. The current iPTMnet website can be used for querying and browsing rich PTM information but does not support automated iPTMnet data integration with other tools. Hence, we have developed a RESTful API utilizing the latest developments in cloud technologies to facilitate the integration of iPTMnet into existing tools and pipelines. We have packaged iPTMnet API software in Docker containers and published it on DockerHub for easy redistribution. We have also developed Python and R packages that allow users to integrate iPTMnet for scientific discovery, as demonstrated in a use case that connects PTM sites to kinase signaling pathways.32395768
#Tool NameDescriptionURLReference
1BPB-PPMSProtein methylation is one type of reversible post-translational modifications (PTMs), which plays vital roles in many cellular processes such as transcription activity, DNA repair. Experimental identification of methylation sites on proteins without prior knowledge is costly and time-consuming. In silico prediction of methylation sites might not only provide researches with information on the candidate sites for further determination, but also facilitate to perform downstream characterizations and site-specific investigations. In the present study, a novel approach based on Bi-profile Bayes feature extraction combined with support vector machines (SVMs) was employed to develop the model for Prediction of Protein Methylation Sites (BPB-PPMS) from primary sequence. Methylation can occur at many residues including arginine, lysine, histidine, glutamine, and proline. For the present, BPB-PPMS is only designed to predict the methylation status for lysine and arginine residues on polypeptides due to the absence of enough experimentally verified data to build and train prediction models for other residues. The performance of BPB-PPMS is measured with a sensitivity of 74.71%, a specificity of 94.32% and an accuracy of 87.98% for arginine as well as a sensitivity of 70.05%, a specificity of 77.08% and an accuracy of 75.51% for lysine in 5-fold cross validation experiments. Results obtained from cross-validation experiments and test on independent data sets suggest that BPB-PPMS presented here might facilitate the identification and annotation of protein methylation. Besides, BPB-PPMS can be extended to build predictors for other types of PTM sites with ease. For public access, BPB-PPMS is available at http://www.bioinfo.bio.cuhk.edu.hk/bpbppms.http://www.bioinfo.bio.cuhk.edu.hk/bpbpp?ms19290060
2iMethyl-PseAACThe web-server iMethyl-PseAAC is a web server that could predict methylation sites in proteins. With the assistance of SVM, the highlight of iMethyl-PseAAC is to employ amino acid sequence features extracted from the sequence evolution information via grey system model (Grey-PSSM). Caveat To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input.http://www.jci-bioinfo.cn/iMethyl-PseAAC24977164
3MASAStudies within the last few years have identified that protein methylation occurring on histones and other proteins are involved in the regulation of gene transcription. Several previous works were developed to computationally identify the potential methylation sites on lysine and arginine. With the investigation in protein tertiary structure, protein methylation site prefers to occur in regions that are easily accessible. However, previous works does not take the solvent accessible surface area (ASA) surrounding the methylation sites into account. Herein, we propose a method named MASA that incorporates support vector machine (SVM) with sequenced and structural characteristics for identifying protein methylation sites on lysine, arginine, glutamate, and asparagine. Because most of experimental methylation sites not have the corresponded protein tertiary structures in Protein Data Bank (PDB), the effective solvent accessible prediction tools was applied to determine the potential ASA values of amino acids in proteins. After the evaluation of the predictive performance based on cross-validation, it demonstrates that the ASA values surrounding the methylation sites can improve the prediction accuracy. Moreover, the independent test shows that the prediction accuracies on methylated lysine and arginine are 80.8% and 85.0%, respectively. Finally, the proposed method is implemented as an effective prediction system for identifying protein methylation sites.http://MASA.mbc.nctu.edu.tw/19263424
4MeMoMeMo is the first protein methylation prediction server based on SVM (support vector machine). Limited by available training data, at present MeMo only focuses on Arginine and Lysine sites. Users could submit their protein sequences to predict which arginine and Lysine sites are undergoing methylation.http://www.bioinfo.tsinghua.edu.cn/~tigerchen/memo.html16845004
#Tool NameDescriptionURLReference
1pCysModThiol groups on cysteines can undergo multiple post-translational modifications (PTMs), acting as a molecular switch to maintain redox homeostasis and regulating a series of cell signaling transductions. Identification of sophistical protein cysteine modifications is crucial for dissecting its underlying regulatory mechanism. Instead of a time-consuming and labor-intensive experimental method, various computational methods have attracted intense research interest due to their convenience and low cost. Here, we developed the first comprehensive deep learning based tool pCysMod for multiple protein cysteine modification prediction, including S-nitrosylation, S-palmitoylation, S-sulfenylation, S-sulfhydration, and S-sulfinylation. Experimentally verified cysteine sites curated from literature and sites collected by other databases and predicting tools were integrated as benchmark dataset. Several protein sequence features were extracted and united into a deep learning model, and the hyperparameters were optimized by particle swarm optimization algorithms. Cross-validations indicated our model showed excellent robustness and outperformed existing tools, which was able to achieve an average AUC of 0.793, 0.807, 0.796, 0.793, and 0.876 for S-nitrosylation, S-palmitoylation, S-sulfenylation, S-sulfhydration, and S-sulfinylation, demonstrating pCysMod was stable and suitable for protein cysteine modification prediction. Besides, we constructed a comprehensive protein cysteine modification prediction web server based on this model to benefit the researches finding the potential modification sites of their interested proteins, which could be accessed at http://pcysmod.omicsbio.info. This work will undoubtedly greatly promote the study of protein cysteine modification and contribute to clarifying the biological regulation mechanisms of cysteine modification within and among the cells.http://pcysmod.omicsbio.info/33732693
#Tool NameDescriptionURLReference
1NMTMany posttranslational modifications (N-myristoylation or glycosylphosphatidylinositol (GPI) lipid anchoring) and localization signals (the peroxisomal targeting signal PTS1) are encoded in short, partly compositionally biased regions at the N- or C-terminus of the protein sequence. These sequence signals are not well defined in terms of amino acid type preferences but they have significant interpositional correlations. Although the number of verified protein examples is small, the quantification of several physical conditions necessary for productive protein binding with the enzyme complexes executing the respective transformations can lead to predictors that recognize the signals from the amino acid sequence of queries alone. Taxon-specific prediction functions are required due to the divergent evolution of the active complexes. The big-Pi tool for the prediction of the C-terminal signal for GPI lipid anchor attachment is available for metazoan, protozoan and plant sequences. The myristoyl transferase (NMT) predictor recognizes glycine N-myristoylation sites (at the N-terminus and for fragments after processing) of higher eukaryotes (including their viruses) and fungi. The PTS1 signal predictor finds proteins with a C-terminus appropriate for peroxisomal import (for metazoa and fungi). Guidelines for application of the three WWW-based predictors (http://mendel.imp.univie.ac.at/) and for the interpretation of their output are described.http://mendel.imp.ac.at/myristate/SUPLpredictor.htm12824382
#Tool NameDescriptionURLReference
1NetAcetThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetAcet/15539450
2N-AceN-Ace is a web tool for predicting the protein Acetylation site based on Support Vector Machine (SVM), which is training depend on the amino acid sequence and other structural characteristics, such as accessible surface area, absolute entropy, non-bonded energy, size, amino acid composition, steric parameter, hydrophobicity, volume, mean polarity, electric charge, heat capacity and isoelectric point which is surrounding the modification site and implemented two stages SVM method.http://N-Ace.mbc.NCTU.edu.tw/20839302
#Tool NameDescriptionURLReference
1NetNGlycThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetNGlyc/
2GECSAbout GECS GECS (Gene Expression to Chemical Structure) is a collection of prediction methods linking genomic or transcriptomic contents of genes to chemical structures of biosynthetic substances. This N-Glycan Prediction Server is based on the repertoire of glycosyltransferases for N-glycan biosynthesis.http://www.genome.jp/tools/gecs/16159923
3N-GlycoGoGlycosylation is the most complex post-modification effect of proteins. It participates in many biological processes in the human body and is closely related to many disease states. Among them, N-linked glycosylation is the most contained glycosylation data. However, the current N-linked glycosylation prediction tool does not take into account the serious imbalance between positive and negative data. In this study, we used protein sequence and amino acid characteristics to construct an N-linked glycosylation prediction model called N-GlycoGo. Based on sequence, structure, and function, 11 heterogeneous features were encoded. Further, XGBoost was selected for modeling. Finally, independent testing of human and mouse prediction models showed that N-GlycoGo is superior to other tools with Matthews correlation coefficient (MCC) values of 0.397 and 0.719, respectively, which is higher than other glycosylation site prediction tools. We have developed a fast and accurate prediction tool, N-GlycoGo, for N-linked glycosylation. N-GlycoGo is available at http://ncblab.nchu.edu.tw/n-glycogo/.http://ncblab.nchu.edu.tw/n-glycogo/10.1109/ACCESS.2020.3022629
#Tool NameDescriptionURLReference
1NeddPredIntroduction: Neddylation is a highly dynamic and reversible post-translational modification. The abnormality of neddylation has previously been shown to be closely related to some human diseases. The detection of neddylation sites is essential for elucidating the regulation mechanisms of protein neddylation. Objective: As the detection of the lysine neddylation sites by the traditional experimental method is often expensive and time-consuming, it is imperative to design computational methods to identify neddylation sites. Methods: In this study, a bioinformatics tool named NeddPred is developed to identify underlying protein neddylation sites. A bi-profile bayes feature extraction is used to encode neddylation sites and a fuzzy support vector machine model is utilized to overcome the problem of noise and class imbalance in the prediction. Results: Matthew's correlation coefficient of NeddPred achieved 0.7082 and an area under the receiver operating characteristic curve of 0.9769. Independent tests show that NeddPred significantly outperforms existing lysine neddylation sites predictor NeddyPreddy. Conclusion: Therefore, NeddPred can be a complement to the existing tools for the prediction of neddylation sites. A user-friendly webserver for NeddPred is accessible at 123.206.31.171/NeddPred/.123.206.31.171/NeddPred/32581647
#Tool NameDescriptionURLReference
1NTyroSiteNitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation for further elucidating the mechanism of protein nitrotyrosination. However, experimental identification of nitrotyrosine sites through traditional methods are laborious and expensive. In silico prediction of nitrotyrosine sites based on protein sequence information are thus highly desired. Here, we report a novel predictor, NTyroSite, for accurate prediction of nitrotyrosine sites using sequence evolutionary information. The generated features were optimized using a Wilcoxon-rank sum test. A random forest classifier was then trained using these features to build the predictor. The final NTyroSite predictor achieved an area under a receiver operating characteristics curve (AUC) score of 0.904 in a 10-fold cross-validation test. It also significantly outperformed other existing implementations in an independent test. Meanwhile, for a better understanding of our prediction model, the predominant rules and informative features were extracted from the NTyroSite model to explain the prediction results. We expect that the NTyroSite predictor may serve as a useful computational resource for high-throughput nitrotyrosine site prediction. The online interface of the software is publicly available at https://biocomputer.bio.cuhk.edu.hk/NTyroSite/.https://biocomputer.bio.cuhk.edu.hk/NTyroSite/29987232
2PredNTSNitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.http://kurata14.bio.kyutech.ac.jp/PredNTS/33800121
#Tool NameDescriptionURLReference
1YinOYangThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/YinOYang/11928486
2DictyOGlycThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/DictyOGlyc/10521537
#Tool NameDescriptionURLReference
1NetOGlycThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetOGlyc/23584533
2OglycO-glycosylation is one of the most important, frequent and complex post-translational modifications. This modification can activate and affect protein functions. Here, we present three support vector machines models based on physical properties, 0/1 system, and the system combining the above two features. The prediction accuracies of the three models have reached 0.82, 0.85 and 0.85, respectively. The accuracies of the three SVMs methods were evaluated by 'leave-one-out' cross validation. This approach provides a useful tool to help identify the O-glycosylation sites in mammalian proteins. An online prediction web server is available at http://www.biosino.org/Oglyc.http://www.biosino.org/Oglyc/16731044
#Tool NameDescriptionURLReference
1PhoglyStructThe biological process known as post-translational modification (PTM) contributes to diversifying the proteome hence affecting many aspects of normal cell biology and pathogenesis. There have been many recently reported PTMs, but lysine phosphoglycerylation has emerged as the most recent subject of interest. Despite a large number of proteins being sequenced, the experimental method for detection of phosphoglycerylated residues remains an expensive, time-consuming and inefficient endeavor in the post-genomic era. Instead, the computational methods are being proposed for accurately predicting phosphoglycerylated lysines. Though a number of predictors are available, performance in detecting phosphoglycerylated lysine residues is still limited. In this paper, we propose a new predictor called PhoglyStruct that utilizes structural information of amino acids alongside a multilayer perceptron classifier for predicting phosphoglycerylated and non-phosphoglycerylated lysine residues. For the experiment, we located phosphoglycerylated and non-phosphoglycerylated lysines in our employed benchmark. We then derived and integrated properties such as accessible surface area, backbone torsion angles, and local structure conformations. PhoglyStruct showed significant improvement in the ability to detect phosphoglycerylated residues from non-phosphoglycerylated ones when compared to previous predictors. The sensitivity, specificity, accuracy, Mathews correlation coefficient and AUC were 0.8542, 0.7597, 0.7834, 0.5468 and 0.8077, respectively. The data and Matlab/Octave software packages are available at https://github.com/abelavit/PhoglyStruct .https://github.com/abelavit/PhoglyStruct30560923
2Bigram-PGKBackground: The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification. Results: We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively. Conclusions: The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.https://github.com/abelavit/Bigram-PGK31856704
3iDPGKBackground: Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites. Result: We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. Conclusion: The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/ .http://mer.hc.mmh.org.tw/iDPGK/33297954
4RAM-PGKBackground: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. Results: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. Conclusion: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.33419274
5iPGK-PseAACBackground: Occurring at Lys residues, the PGK (lysine phosphoglycerylation) is a special kind of post-translational modification (PTM). It may invert the charge potential of the modified residue and change the protein structures and functions, causing various diseases in liver, brain, and kidney. Objective: From the angles of both basic research and drug development, we are facing a critical challenging problem: for an uncharacterized protein sequence containing many Lys residues, which ones can be of phosphoglycerylation, and which ones cannot? Method: To address this problem, we have developed a predictor called iPGK-PseAAC by incorporating into the general PseAAC (pseudo amino acid composition) with four different tiers of amino acid pairwise coupling information, where tiers 1, 2, 3, and 4 refer to the amino acid pairwise couplings between all the 1st, 2nd, 3rd, and 4th most contiguous residues along a protein segment, respectively. Results: Rigorous cross-validations indicated that the proposed predictor remarkably outperformed its existing counterparts. Conclusion: The proposed predictor iPGK-PseAAC will become a very useful bioinformatics tool for medicinal chemistry. For the convenience of most experimental scientists, a user-friendly webserver for iGPK-PseAAC has been established at http://app.aporc.org/iPGK-PseAAC/, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.http://app.aporc.org/iPGK-PseAAC/28521678
6EvolStruct-PhoglyBackground: Post-translational modification (PTM), which is a biological process, tends to modify proteome that leads to changes in normal cell biology and pathogenesis. In the recent times, there has been many reported PTMs. Out of the many modifications, phosphoglycerylation has become particularly the subject of interest. The experimental procedure for identification of phosphoglycerylated residues continues to be an expensive, inefficient and time-consuming effort, even with a large number of proteins that are sequenced in the post-genomic period. Computational methods are therefore being anticipated in order to effectively predict phosphoglycerylated lysines. Even though there are predictors available, the ability to detect phosphoglycerylated lysine residues still remains inadequate. Results: We have introduced a new predictor in this paper named EvolStruct-Phogly that uses structural and evolutionary information relating to amino acids to predict phosphoglycerylated lysine residues. Benchmarked data is employed containing experimentally identified phosphoglycerylated and non-phosphoglycerylated lysines. We have then extracted the three structural information which are accessible surface area of amino acids, backbone torsion angles, amino acid's local structure conformations and profile bigrams of position-specific scoring matrices. Conclusion: EvolStruct-Phogly showed a noteworthy improvement in regards to the performance when compared with the previous predictors. The performance metrics obtained are as follows: sensitivity 0.7744, specificity 0.8533, precision 0.7368, accuracy 0.8275, and Mathews correlation coefficient of 0.6242. The software package and data of this work can be obtained from https://github.com/abelavit/EvolStruct-Phogly or www.alok-ai-lab.com.https://github.com/abelavit/EvolStruct-Phogly or www.alok-ai-lab.com30999859
7predPhogly-SitePost-translational modification (PTM) involves covalent modification after the biosynthesis process and plays an essential role in the study of cell biology. Lysine phosphoglycerylation, a newly discovered reversible type of PTM that affects glycolytic enzyme activities, and is responsible for a wide variety of diseases, such as heart failure, arthritis, and degeneration of the nervous system. Our goal is to computationally characterize potential phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites in the protein. It has effectively utilized the probabilistic sequence-coupling information among the nearby amino acid residues of phosphoglycerylation sites along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics. It has achieved around 99% accuracy with more than 0.96 MCC and 0.97 AUC in both 10-fold cross-validation and independent test. Even, the standard deviation in 10-fold cross-validation is almost negligible. This performance indicates that predPhogly-Site remarkably outperformed the existing prediction tools and can be used as a promising predictor, preferably with its web interface at http://103.99.176.239/predPhogly-Site.http://103.99.176.239/predPhogly-Site33793659
#Tool NameDescriptionURLReference
1AMSWe present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing.https://code.google.com/p/automotifserver/22555647
2CKSAAP_PhSiteAs one of the most widespread protein post-translational modifications, phosphorylation is involved in many biological processes such as cell cycle, apoptosis. Identification of phosphorylated substrates and their corresponding sites will facilitate the understanding of the molecular mechanism of phosphorylation. Comparing with the labor-intensive and time-consuming experiment approaches, computational prediction of phosphorylation sites is much desirable due to their convenience and fast speed. In this paper, a new bioinformatics tool named CKSAAP_PhSite was developed that ignored the kinase information and only used the primary sequence information to predict protein phosphorylation sites. The highlight of CKSAAP_PhSite was to utilize the composition of k-spaced amino acid pairs as the encoding scheme, and then the support vector machine was used as the predictor. The performance of CKSAAP_PhSite was measured with a sensitivity of 84.81%, a specificity of 86.07% and an accuracy of 85.43% for serine, a sensitivity of 78.59%, a specificity of 82.26% and an accuracy of 80.31% for threonine as well as a sensitivity of 74.44%, a specificity of 78.03% and an accuracy of 76.21% for tyrosine. Experimental results obtained from cross validation and independent benchmark suggested that our method was very promising to predict phosphorylation sites and can be served as a useful supplement tool to the community. For public access, CKSAAP_PhSite is available at http://59.73.198.144/cksaap_phsite/.http://59.73.198.144/cksaap_phsite/23110047
3CRPhosWelcome to the pTools webserver. This website is a joint development by the Centre for Proteome Analysis and the Intelligent Systems Lab, at the University of Antwerp. Here we present in-house developed tools for protein and proteome (data) analysis. Downloadable codes is shared here whenever a project is considered sufficiently mature, in the meanwhile these pages give you some overview of the current status of some projects. Though the proteomics field is rapidly evolving, data analysis is still a major bottleneck in proteome analysis. Sharing data, databases and tools among the research community is one of our goals. The pTools website has a sister site, called pData, on which we share experimental proteomic datasets.http://www.ptools.ua.ac.be/CRPhos18940828
4DAPPLEDAPPLE represents an alternative method (to machine-learning approaches) to predicting phosphorylation sites in an organism of interest. It is a pipeline involving BLAST searches that uses experimentally-determined phosphorylation sites in one organism (or several organisms) to predict phosphorylation sites in an organism of interest. It outputs a table in tab-deliminated text format (which can also be easily imported into a spreadsheet program like Excel), which contains various information helpful for choosing phosphorylation sites that are of interest to you, such as the number of sequence differences between the query site and the hit site, the location of the query site and the hit site in their respective intact proteins, whether the corresponding intact proteins are reciprocal BLAST hits (and thus predicted orthologues), and so on. The following is a web interface to DAPPLE. If you would instead like to run DAPPLE on your own machine, you may download it here. This .zip file includes instructions for setting up DAPPLE.http://saphire.usask.ca/saphire/dapple/index.html23658419
5DISPHOSDISPHOS computationally predicts serine, threonine and tyrosine phosphorylation sites in proteins. The new version of the predictor (DISPHOS 1.3) was trained on over 2000 non-redundant experimentally confirmed protein phosphorylation sites (1,079 Serine sites, 666 Threonine sites, and 375 Tyrosine sites). The new set of phosphorylation sites was augmented using the entries from SwissProt R44, Phospho.ELM database, and literature. The observation that amino acid composition, sequence complexity, hydrophobicity, charge and other sequence attributes of regions adjacent to phosphorylation sites are very similar to those of intrinsically disordered protein regions suggests that disorder in and around the potential phosphorylation target site is an important prerequisite for phosphorylation. Thus, DISPHOS uses disorder information to improve the discrimination between phosphorylation and non-phosphorylation sites. The accuracy of DISPHOS reaches 81.3% +/- 2.2% for Serine, 74.8% +/- 2.5% for Threonine, and 79.0% +/- 2.4% for Tyrosine. The application of DISPHOS to ordered and disordered protein regions, as well as to various functional protein categories and proteomes provides strong support for the hypothesis that protein phosphorylation predominantly occurs in regions of intrinsic disorder. Executable version of DISPHOS 1.3 was developed in collaboration with Molecular Kinetics, Inc. This predictor is also available on the Molecular Kinetics website: http://www.pondr.comhttp://www.dabi.temple.edu/disphos/14960716
6GPSProtein phosphorylation is the most ubiquitous post-translational modification (PTM), and plays important roles in most of biological processes. Identification of site-specific phosphorylated substrates is fundamental for understanding the molecular mechanisms of phosphorylation. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources of protein phosphorylation, including phosphorylation databases, prediction of non-specific or organism-specific phosphorylation sites, prediction of kinase-specific phosphorylation sites or phospho-binding motifs, and other tools. A testing data set taken from four high throughput experiments is available at: Comparison_data. We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for users feedback. Please inform Dr. Yu Xue or Yongbo Wang to add, remove or update one or multiple web links below.http://gps.biocuckoo.org/21062758
7HMMpTMDuring the last decades a large number of computational methods have been developed for predicting transmembrane protein structure and topology. Current predictors rely on two topogenic signals in the protein sequence: the distribution of positively charged residues in extra-membrane loops and the existence of N-terminal signals. However, phosphorylation and glycosylation are post-translational modifications (PTMs) that occur in a compartment-specific manner and therefore the presence of a phosphorylation or glycosylation site in a transmembrane protein provides topological information. Here we report a Hidden Markov Model based method capable of predicting the topology of transmembrane proteins and the existence of kinase specific phosphorylation and N/O-linked glycosylation sites across the protein sequence. Our method integrates a novel feature in transmembrane protein topology prediction which results in improved performance for topology prediction and reliable prediction of phosphorylation and glycosylation sites when compared to currently available predictors.http://aias.biol.uoa.gr/HMMpTM/24225132
8KinasePhosProtein phosphorylation is an important reversible mechanism in post-translational modifications of proteins, and it affects a lot of kinds of essential cellular processes. Due to the importance of protein phosphorylation in cellular control, there are many schemes and models to predict the catalytic kinase-specific phosphorylation sites. Most of methods are based on the consensus sequences of position probabilities, just like our previous version KinasePhos 1.0, which is also a web server based on the consensus. The known phosphorylation sites from public domain data sources are categorized by their annotated protein kinases. In the previous version, feature based on the profile hidden Markov model, and computational models are learned from the kinase-specific groups of the phosphorylation sites. After evaluating the learned models, the model with highest accuracy was selected from each kinase-specific group, for using in a web-based prediction tool for identifying protein phosphorylation sites. It is a kinase-specific phosphorylation site prediction tool with both high sensitivity and specificity. Moreover, the current release of KinasePhos, version 2.0, adapts the sequence-based amino acid coupling-pattern analysis and solvent accessibility as new features for SVM (support vector machine) to characterize the phosphorylation site. The feature of coupling-pattern [XdZ] denotes the amino acid coupling-pattern of amino acid types X and Z that are separated by d amino acids. We use the coupling strength CXdZ defined by coupling-pattern analysis, and we compute the differences between positive and negative set of phosphorylation proteins. We select the features which are top 250 differences of CXdZ. Then build SVM (support vector machine) to build the models and performed the cross validation. It is about 95% prediction accuracy that using this prediction model and gets 7% more improvement than previous version. Compared with other tools, the special features chosen for SVM model-building produces the best prediction so far.http://kinasephos2.mbc.nctu.edu.tw/17517770
9KinomeXplorerKinomeXplorer is an integrated framework for modeling kinase-substrate interactions and aid in the design of inhibitor-based follow-up perturbation experiments. An interactive web interface allows investigation of predicted kinase-substrate interactions from human and major eukaryotic model organisms. http://kinomexplorer.info/24874572
10MetaPredPSRemarkable morphological anomalies were observed in a female of Hoplopleura capitosa found on Mus musculus caught in Niemirowek, the Tomaszow district (Poland). The anomalies concerned the shape and chaetotaxis of some parapleural plates on the abdomen, constitute one of the basic taxonomical features of Anoplura.http://metapred.biolead.org/MetaPredPS/1823471
11MusiteTo address the various limitations of current tools when applying to proteomes and to better utilize the large magnitude of experimentally verified phosphorylation sites, we developed a unique standalone application system Musite, specifically designed for large-scale prediction of both general and kinase-specific phosphorylation sites. Musite utilized local sequence similarity patterns (KNN scores) and generic features (disorder scores and amino acid frequencies) of phosphorylation sites, and employed a comprehensive machine learning approach to make predictions. Musite is the first tool that provides utility for training a phosphorylation-site prediction model from users' own data and supports continuous adjustment of stringency levels. Musite provides a user-friendly graphic user interface, which makes it easy for biologists to perform predictions in an automated fashion. Applications of Musite on six proteomes yielded tens of thousands of putative phosphorylation sites with high stringency. These predictions provide useful hypotheses for experimental validations. Cross-validation tests show that Musite significantly outperforms existing tools for predicting general phosphorylation sites and is at least comparable to those for predicting kinase-specific phosphorylation sites. Moreover, as an open-source software, Musite can be also served as an open platform for building machine learning application for phosphorylation-site prediction.http://musite.sourceforge.net/20702892
12NetPhorestKinomeXplorer is an integrated framework for modeling kinase-substrate interactions and aid in the design of inhibitor-based follow-up perturbation experiments. An interactive web interface allows investigation of predicted kinase-substrate interactions from human and major eukaryotic model organisms. http://netphorest.info/18765831
13NetPhosThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetPhos/10600390
14NetPhosKThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetPhosK/15174133
15NetPhosYeastThe Center for Biological Sequence Analysis at the Technical University of Denmark was formed in 1993, and conducts basic research in the field of bioinformatics and systems biology. The group of +90 scientists, working in ten specialist research groups, has a highly multi-disciplinary profile (molecular biologists, biochemists, medical doctors, physicists and computer scientists) with a ratio of 2:1 of bio-to-nonbio backgrounds. CBS represents one of the large bioinformatics groups in academia in Europe. Bioinformatics is the term used to refer to the combination of methods in biology, computation, and information management, which are necessary to advance research relating to all aspects of living systems - from individual molecules, cells, and organs to entire organisms. Today, research in molecular biology, biotechnology and pharmacology depends on information technology all the way from experiment to the publication of the results. Comprehensive public databases of DNA- and protein sequences, macromolecular structure, g ene and protein expression levels, pathway organization and cell signalling, have been established to optimise scientific exploitation of the explosion of data within biology. Unlike many other groups in the field of biomolecular informatics, Center for Biological Sequence Analysis directs its research primarily towards topics related to the elucidation of the functional aspects of complex biological mechanisms. Among contemporary bioinformatics concerns are reliable computational interpretation of a wide range of experimental data, and the detailed understanding of the molecular apparatus behind cellular mechanisms of sequence information. By exploiting available experimental data and evidence in the design of algorithms, sequence correlations and other features of biological significance can be inferred. In addition to the computational research the center also has experimental efforts in gene expression analysis using DNA chips and data generation in relation to the physical and structural properties of DNA. In the last decade, the Center for Biological Sequence Analysis has produced a large number of computational methods, which are offered to others via WWW servers. Based on bioinformatics efforts started in the late 1980s, the activity was established formally as a center in 1993 by a grant from the Danish National Research Foundation.http://www.cbs.dtu.dk/services/NetPhosYeast/17282998
16phos_predReversible protein phosphorylation is one of the most important post-translational modifications, which regulates various biological cellular processes. Identification of the kinase-specific phosphorylation sites is helpful for understanding of phosphorylation mechanism and regulation processes. Although a number of computational approaches have been developed, currently few studies have concerned about hierarchical structures of kinases and most of the existing tools use only local sequence information to construct predictive models. In this work, we conduct a systematic and hierarchy-specific investigation of protein phosphorylation site prediction in which protein kinases are clustered into hierarchical structures with four levels including kinase, subfamily, family and group. To enhance phosphorylation site prediction at all hierarchical levels, functional information of proteins, including gene ontology (GO) and protein-protein interaction (PPI), is adopted in addition to primary sequence to construct prediction models based on random forest (RF). Analysis of selected GO and PPI features shows that functional information is critical in determining protein phosphorylation sites for every hierarchical level. Furthermore, the prediction results of Phospho.ELM and additional testing dataset demonstrate that the proposed method remarkably outperforms existing phosphorylation prediction methods at all hierarchical levels.The proposed method is freely available at http://bioinformatics.ustc.edu.cn/phos_pred/.http://bioinformatics.ustc.edu.cn/phos_pred/24452754
17Phos3DPhos3D is a web server for the prediction of phosphorylation sites (P-sites) in proteins, originally designed to investigate the advantages of including spatial information in P-site prediction. The approach is based on Support Vector Machines trained on sequence profiles enhanced by information from the spatial context of experimentally identified P-sites. In addition to serine, threonine, and tyrosine P-sites, Phos3D is capable to predict kinase-specific phosphorylations by the serine kinases PKA, PKC, MAPK, and CKII, as well as by the tyrosine kinase SRC. The quality of predictions is greatly dependent on the quality of submitted protein structures.http://phos3d.mpimp-golm.mpg.de/19383128
18PhoScanProtein phosphorylation plays important roles in a variety of cellular processes. Detecting possible phosphorylation sites and their corresponding protein kinases is crucial for studying the function of many proteins. This article presents a new prediction system, called PhoScan, to predict phosphorylation sites in a kinase-family-specific way. Common phosphorylation features and kinase-specific features are extracted from substrate sequences of different protein kinases based on the analysis of published experiments, and a scoring system is developed for evaluating the possibility that a peptide can be phosphorylated by the protein kinase at the specific site in its sequence context. PhoScan can achieve a specificity of above 90% with sensitivity around 90% at kinase-family level on the data experimented. The system is applied on a set of human proteins collected from Swiss-Prot and sets of putative phosphorylation sites are predicted for protein kinase A, cyclin-dependent kinase, and casein kinase 2 families. PhoScan is available at http://bioinfo.au.tsinghua.edu.cn/phoscan/.http://bioinfo.au.tsinghua.edu.cn/phoscan/17680694
19PHOSFERMOTIVATION: Phosphorylation is the most important post-translational modification in eukaryotes. Although many computational phosphorylation site prediction tools exist for mammals, and a few were created specifically for Arabidopsis thaliana, none are currently available for other plants. RESULTS: In this article, we propose a novel random forest-based method called PHOSFER (PHOsphorylation Site FindER) for applying phosphorylation data from other organisms to enhance the accuracy of predictions in a target organism. As a test case, PHOSFER is applied to phosphorylation sites in soybean, and we show that it more accurately predicts soybean sites than both the existing Arabidopsis-specific predictors, and a simpler machine-learning scheme that uses only known phosphorylation sites and non-phosphorylation sites from soybean. In addition to soybean, PHOSFER will be extended to other organisms in the near future.http://saphire.usask.ca/saphire/phosfer/index.html23341503
20PHOSITESUMMARY: The prediction of significant short functional protein sequences has inherent problems. In predicting phosphorylation sites, problems came from the shortness of phosphorylation sites, the difficulties in maintaining many different predefined models of binding sites, and the difficulties of obtaining highly sensitive predictions and of obtaining predictions with a constant sensitivity and specificity. The algorithm presented in this paper overcomes these problems. The proposed algorithm PHOSITE is based on the case-based sequence analysis. This enables the prediction of phosphorylation sites with constant specificity and sensitivity. Furthermore, this method leads not only to the prediction of phosphorylation sites in general but also predicts the most probable type of kinase involved. AVAILABILITY: The tool PHOSITE implementing the presented method can be evaluated under the website http://www.phosite.com.http://www.phosite.com15297298
21PhosphoPICKWe're a bioinformatics group at the University of Queensland, Australia. Our research aims to develop, investigate and apply bioinformatics methodologies to understand and resolve a range of open problems in genomics, molecular and systems biology. Recent applications involve protein sorting, nuclear protein organisation, mechanisms of transcriptional regulation, sequence and structure determinants of protein function and modification, and protein engineering.http://bioinf.scmb.uq.edu.au/phosphopick/phosphopick25304781
22PhosphoRicePhosphoRice,a meta-predictor of rice-specific phosphorylation site, was constructed by integrating the newly phosphorylation sites predictors, NetPhos2.0, NetPhosK, Kinasephos, Scansite, Disphos and Predphosphos with parameters selected by restricted grid search and random search. It archieve an increase in MCC of 7.1%, and an increase in ACC of 4.6% than that of the best element predictor (Disphos_default), respectively.https://github.com/PEHGP/PhosphoRice22305189
23PhosphoSVMPhosphorylation is the most essential post-translational modification in eukaryotes and in particular plays a crucial role in a wide range of cellular processes. While, experiments on phosphorylation site discovery are time consuming and expensive to perform. Therefore, computational prediction methods becomes more popular as an important complementary approach in protein phosphorylation site study. The prediction tools can be grouped into two categories: Kinase-specific and non-kinase-specific tools. A kinase-specific prediction program requires as input both a protein sequence and the type of a kinase, and produces some measure of the likelihood that each S/T/Y residue in the sequence is phosphorylated by the chosen kinase. In contrast, a non-kinase-specific prediction tool requires only a protein sequence as input, and reports the likelihood that each S/T/Y residue is phosphorylated by any possible kinase. Non-kinase-specific tools may be able to detect phosphorylation sites for which the associated kinase is unknown or the number of known substrate sequences of the associated kinase is few. With the development of sequencing technology, there is an increase demand for non-kinase-specific tools, but the current state for them is not satisfying in both quality and quantity. In this work, we developed a non-kinase-specific protein phosphorylation site prediction method that uses random forest classifier to integrate nine different sequence level scores. These sequence-based features are Shannon entropy (SE), relative entropy (RE), predicted protein secondary structure (SS), predicted protein disorder (PD), accessible surface area (ASA), overlapping properties (OP), averaged cumulative hydrophobicity (ACH), and k-nearest neighbor (KNN). By carefully optimized parameter and sliding window size, our method achieved AUC values 0.8405/0.8183/0.7383 for serine (S), threonine (T), and tyrosine (Y) phosphorylation sites in animals in a ten-fold cross-validation.http://sysbio.unl.edu/PhosphoSVM/24623121
24pkaPSBACKGROUND: Protein kinase A (cAMP-dependent kinase, PKA) is a serine/threonine kinase, for which ca. 150 substrate proteins are known. Based on a refinement of the recognition motif using the available experimental data, we wished to apply the simplified substrate protein binding model for accurate prediction of PKA phosphorylation sites, an approach that was previously successful for the prediction of lipid posttranslational modifications and of the PTS1 peroxisomal translocation signal. RESULTS: Approximately 20 sequence positions flanking the phosphorylated residue on both sides have been found to be restricted in their sequence variability (region -18...+23 with the site at position 0). The conserved physical pattern can be rationalized in terms of a qualitative binding model with the catalytic cleft of the protein kinase A. Positions -6...+4 surrounding the phosphorylation site are influenced by direct interaction with the kinase in a varying degree. This sequence stretch is embedded in an intrinsically disordered region composed preferentially of hydrophilic residues with flexible backbone and small side chain. This knowledge has been incorporated into a simplified analytical model of productive binding of substrate proteins with PKA. CONCLUSION: The scoring function of the pkaPS predictor can confidently discriminate PKA phosphorylation sites from serines/threonines with non-permissive sequence environments (sensitivity of appoximately 96% at a specificity of approximately 94%). The tool "pkaPS" has been applied on the whole human proteome. Among new predicted PKA targets, there are entirely uncharacterized protein groups as well as apparently well-known families such as those of the ribosomal proteins L21e, L22 and L6. AVAILABILITY: The supplementary data as well as the prediction tool as WWW server are available at http://mendel.imp.univie.ac.at/sat/pkaPS. REVIEWERS: Erik van Nimwegen (Biozentrum, University of Basel, Switzerland), Sandor Pongor (International Centre for Genetic Engineering and Biotechnology, Trieste, Italy), Igor Zhulin (University of Tennessee, Oak Ridge National Laboratory, USA).http://mendel.imp.ac.at/sat/pkaPS/17222345
25PKISThe increasingly huge gap of kinase-specific phosphorylation data hampers the reconstruction of signal transduction networks. Existing experimental methods and computational phosphorylation sites (P-sites) predictions tools have various limitations in addressing this problem. Here, based on the latest version of Phopho.ELM (9.0), a novel kinase identification web server, PKIS, incorporating support vector machines (SVMs) with the composition of monomer spectrum (CMS) is used to assign protein kinase for experimentally verified P-sites of human in high specificity, no less than 99%. Comparisons with the well-known P-sites prediction tools, such as KinasePhos 2.0, Musite and GPS2.1, show that the PKIS are more competitive on identifying associated protein kinases for P-sites, which suggests that it is critical to design the kinase assignment algorithm. In addition, application of the PKIS on human phosphoproteomes identified corresponding kinases for tens of thousands of P-sites. These predicted results are significant in encoding the signal networks of human. It is anticipated that PKIS may become a valuable bioinformatics tool for identifying the novel signal pathways or even drug development.http://bioinformatics.ustc.edu.cn/pkis/23941207
26PlantPhosProtein phosphorylation is the most widespread and well-studied post-translational modification in eukaryotic cells. It is one of the most prevalent intracellular protein modifications that influence numerous cellular processes (Steen, Jebanathirajah et al. 2006). It has been estimated that one-third to one-half of all proteins in a eukaryotic cell are phosphorylated (Hubbard and Cohen 1993). Furthermore, protein phosphorylation, catalyzed by specific kinases, plays crucial regulatory roles in intracellular signal transduction. The networks of proteins and small molecules that transmit information from the cell surface to the nucleus, where they ultimately affect transcriptional changes (Steffen, Petti et al. 2002). An estimated 1 to 3% of functional eukaryotic genes encode protein kinases, suggesting that they are involved in many aspects of cellular regulation and metabolism (Stone and Walker 1995). However, a full understanding of the mechanism of intracellular signal transduction remains a major challenge in cellular biology. Protein phosphorylation is an important post-translational modification that regulates various cellular processes not only in humans but also in plants. It is reported that the regulation of carbon and nitrogen metabolism in plants is driven by phosphorylation (Diolez, Kesseler et al. 1993). Phosphorylation is involved in modulating a sucrose phosphate synthase enzyme which controls the signaling pathway for the process of sucrose synthesis from carbon in plants (Huber 2007). Phosphorylation is also involved in modulating the plant process of synthesizing Ammonia, an organic compound which is required to give energy to certain organs which are not able to photosynthesize (Huber 2007). Furthermore, although not yet fully studied, it appears that phosphorylation is also involved in the process of plant growth and plant response to stress (Luan 2002; Huber 2007) . Stone et al. have identified part of the plant kinases; however, the precise functional roles of specific protein kinases were elucidated for only a few (Stone and Walker 1995).http://csb.cse.yzu.edu.tw/PlantPhos/21703007
27PostModPostMod is a predict sever for phosphorylation sites. We develope new predict system soley sequence based approch. We combined physicochemical information ,motif information, and evolutionary information by simply comaparing sequence similarities. Taken together all those features we applied a novel algorithm, indirect relationship based noise-reducing system. This approch is powerful and intuitive to recognize phosphorylation sites. Moreover, our method can be generally applicable to predict other types of PTMshttp://pbil.kaist.ac.kr/PostMod20122181
28PPREDOne of the most critical cellular phenomenon is phosphorylation of proteins as it is involved in signal transduction in various processes including cell cycle, proliferation and apoptosis. This phenomenon is catalyzed by protein kinases that affect certain acceptor residues (Serine, Threonine and Tyrosine) in substrate sequences. Experiments by 2D-gel electrophoresis indicate that 30-50% of the proteins in an eukaryotic cell undergo phosphorylation. So, accurate prediction of the phosphorylation sites of eukaryotic proteins will help in understanding the overall intracellular events. Both experimental and computational methods have been developed to investigate the phosphorylation sites. In vivo and in vitro methods are often time-consuming, expensive and even limited by the restriction of enzymatic reactions. On the other hand, in silico prediction of phosphorylation sites from computational approaches can afford fast and automatic annotation for candidate phosphorylation sites which eventually will be an important breakthrough in many aspects of current molecular biology and very helpful for disease-related research and drug design. We have developed a prediction system (PPRED) that incorporates the evolutionary information of proteins to train the SVMs, which is applicable to predict accurately the phosphorylation sites from given protein sequences and to analysis the importance of such information to devise generalized prediction systems.biomecis.uta.edu/~ashis/res/ppred/20492656
29PPSPAs a reversible and dynamic post-translational modification of proteins, phosphorylation plays an essential regulatory role in a broad spectrum of the biological cellular processes. Conventional experimental identifications of protein kinase (PK)-specific phosphorylation sites on substrates in vivo and in vitro have provided the foundation of understanding the mechanisms of phosphorylation dynamics. However, these experiments are often time-consuming and expensive. And the enzymatic activity of the PKs are usually diminished or impeded in vitro, hampering on the studies of phosphorylation greatly. With regard of this, it is of note that the in silico prediction of PK-specific phosphorylation sites is urgent need for the further experimental manipulation. In this work, we presented a novel, versatile and comprehensive program, PPSP (Prediction of PK-specific Phosphorylation site), deployed with approach of Bayesian decision theory. With the unambiguous experimental verified training data set, PPSP could predict the bona fide phosphorylation sites accurately for 68 PK groups.http://ppsp.biocuckoo.org/16549034
30PredikinPredikin is a system to predict substrate specificity of protein kinases. Some of the things Predikin can be used for include, Predict the most likely phosphorylation site for a specific protein kinase. Predict the most likely protein kinase for a phosphorylation site. Make predictions about WHOLE proteomes. For full details, please refer to the published articles on Predikin. There is also information in the documentation pages. If you experience any difficulties using Predikin, please contact us (we'd also like to hear from you if you have suggestions about improvements to Predikin or this website).http://predikin.biosci.uq.edu.au/18477637
31PredPhosphoMOTIVATION: Phosphorylation is involved in diverse signal transduction pathways. By predicting phosphorylation sites and their kinases from primary protein sequences, we can obtain much valuable information that can form the basis for further research. Using support vector machines, we attempted to predict phosphorylation sites and the type of kinase that acts at each site. RESULTS: Our prediction system was limited to phosphorylation sites catalyzed by four protein kinase families and four protein kinase groups. The accuracy of the predictions ranged from 83 to 95% at the kinase family level, and 76-91% at the kinase group level. The prediction system used-PredPhospho-can be applied to the functional study of proteins, and can help predict the changes in phosphorylation sites caused by amino acid variations at intra- and interspecies levels.http://www.ngri.re.kr/ proteo/PredPhospho.htm15231530
32PSEAProtein phosphorylation catalysed by kinases plays crucial regulatory roles in intracellular signal transduction. With the increasing number of kinase-specific phosphorylation sites and disease-related phosphorylation substrates that have been identified, the desire to explore the regulatory relationship between protein kinases and disease-related phosphorylation substrates is motivated. In this work, we analysed the kinases’ characteristic of all disease-related phosphorylation substrates by using our developed Phosphorylation Set Enrichment Analysis (PSEA) method. We evaluated the efficiency of our method with independent test and concluded that our approach is helpful for identifying kinases responsible for phosphorylated substrates. In addition, we found that Mitogen-activated protein kinase (MAPK) and Glycogen synthase kinase (GSK) families are more associated with abnormal phosphorylation. It can be anticipated that our method might be helpful to identify the mechanism of phosphorylation and the relationship between kinase and phosphorylation related diseases.http://bioinfo.ncu.edu.cn/PKPred_Home.aspx24681538
33PTMPredRecent efforts to develop a universal view of complex networks have created both excitement and confusion about the way in which knowledge of network structure can be used to understand, control, or design system behavior. This paper offers perspective on the emerging field of "network science" in three ways. First, it briefly summarizes the origins, methodological approaches, and most celebrated contributions within this increasingly popular field. Second, it contrasts the predominant perspective in the network science literature (that abstracts away domain-specific function and instead focuses on graph-theoretic measures of system structure and dynamics) with that of engineers and practitioners of decision science (who emphasize the importance of network performance, constraints, and trade-offs). Third, it proposes optimizationbased reverse engineering to address some important open questions within network science from an operations research perspective. We advocate for increased, yet cautious, participation in this field by operations researchers.http://doc.aporc.org/wiki/PTMPred24291233
34RLIMS-PRLIMS-P is a rule-based text-mining program specifically designed to extract protein phosphorylation information on protein kinase, substrate and phosphorylation sites from biomedical literature (Hu et al., 2005). RLIMS-P currently works on PubMed abstracts and open access full text articles.http://research.bioinformatics.udel.edu/rlimsp/25122463
35ViralPhosViralPhos is a web server for identifying potential virus phosphorylation sites with substrate motifs. Phosphorylation of virus proteins is linked to viral replication, which leads to an inhibition of normal host-cell functions. This has motivated the field to further elucidate the process of phosphorylation in viral proteins. However, few studies have investigated substrate motifs in identifying virus phosphorylation sites. Additionally, mass spectrometry-based experiments used to investigate such tend to be time-consuming and labor-intensive. 329 experimentally verified phosphorylation fragments on 111 virus proteins were collected from virPTM. These were clustered into subgroups of significantly conserved motifs using a recursively statistical method. Two-layered Support Vector Machines (SVMs) is then applied to train a predictive model for the identified substrate motifs. The SVM models are evaluated using a five-fold cross validation which yields an average accuracy of 0.86 for serine, and 0.81 for threonine. Furthermore, the proposed method is shown to perform at par with three other phosphorylation site prediction tools: PPSP, KinasePhos 2.0 and GPS 2.1. In this study, we propose a computational method, ViralPhos, which aims to investigate virus substrate site motifs and identify potential phosphorylation sites on virus proteins. We identified informative substrate motifs that matched with several well-studied kinase groups as potential catalytic kinases for virus protein substrates. The identified substrate motifs were further exploited to identify potential virus phosphorylation sites.http://csb.cse.yzu.edu.tw/ViralPhos/24564381
36iPhosY-PseAACProtein phosphorylation is one of the most fundamental types of post-translational modifications and it plays a vital role in various cellular processes of eukaryotes. Among three types of phosphorylation i.e. serine, threonine and tyrosine phosphorylation, tyrosine phosphorylation is one of the most frequent and it is important for mediation of signal transduction in eukaryotic cells. Site-directed mutagenesis and mass spectrometry help in the experimental determination of cellular signalling networks, however, these techniques are costly, time taking and labour associated. Thus, efficient and accurate prediction of these sites through computational approaches can be beneficial to reduce cost and time. Here, we present a more accurate and efficient sequence-based computational method for prediction of phosphotyrosine (PhosY) sites by incorporation of statistical moments into PseAAC. The study is carried out based on Chou's 5-step rule, and various position-composition relative features are used to train a neural network for the prediction purpose. Validation of results through Jackknife testing is performed to validate the results of the proposed prediction method. Overall accuracy validated through Jackknife testing was calculated 93.9%. These results suggest that the proposed prediction model can play a fundamental role in the prediction of PhosY sites in an accurate and efficient way.30311130
37iPhosT-PseAACAmong all the post-translational modifications (PTMs) of proteins, Phosphorylation is known to be the most important and highly occurring PTM in eukaryotes and prokaryotes. It has an important regulatory mechanism which is required in most of the pathological and physiological processes including neural activity and cell signalling transduction. The process of threonine phosphorylation modifies the threonine by the addition of a phosphoryl group to the polar side chain, and generates phosphothreonine sites. The investigation and prediction of phosphorylation sites is important and various methods have been developed based on high throughput mass-spectrometry but such experimentations are time consuming and laborious therefore, an efficient and accurate novel method is proposed in this study for the prediction of phosphothreonine sites. The proposed method uses context-based data to calculate statistical moments. Position relative statistical moments are combined together to train neural networks. Using 10-fold cross validation, 94.97% accurate result has been obtained whereas for Jackknife testing, 96% accurate results have been obtained. The overall accuracy of the system is 94.4% to sensitivity value 94% and specificity 94.6%. These results suggest that the proposed method may play an essential role to the other existing methods for phosphothreonine sites prediction.29704476
38iPhosH-PseAACProtein phosphorylation is one of the key mechanism in prokaryotes and eukaryotes and is responsible for various biological functions such as protein degradation, intracellular localization, the multitude of cellular processes, molecular association, cytoskeletal dynamics, and enzymatic inhibition/activation. Phosphohistidine (PhosH) has a key role in a number of biological processes, including central metabolism to signalling in eukaryotes and bacteria. Thus, identification of phosphohistidine sites in a protein sequence is crucial, and experimental identification can be expensive, time-taking, and laborious. To address this problem, here, we propose a novel computational model namely iPhosH-PseAAC for prediction of phosphohistidine sites in a given protein sequence using pseudo amino acid composition (PseAAC), statistical moments, and position relative features. The results of the proposed predictor are validated through self-consistency testing, 10-fold cross-validation, and jackknife testing. The self-consistency validation gave the 100 percent accuracy, whereas, for cross-validation, the accuracy achieved is 94.26 percent. Moreover, jackknife testing gave 97.07 percent accuracy for the proposed model. Thus, the proposed model iPhosH-PseAAC for prediction of iPhosH site has the great ability to predict the PhosH sites in given proteins.31144645
39PhosphOrthologBackground: Most biological processes are influenced by protein post-translational modifications (PTMs). Identifying novel PTM sites in different organisms, including humans and model organisms, has expedited our understanding of key signal transduction mechanisms. However, with increasing availability of deep, quantitative datasets in diverse species, there is a growing need for tools to facilitate cross-species comparison of PTM data. This is particularly important because functionally important modification sites are more likely to be evolutionarily conserved; yet cross-species comparison of PTMs is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on known orthologous phosphosites, and do not enable the cross-species mapping of newly identified modification sites. Here, we addressed this by developing a web-based software tool, PhosphOrtholog ( www.phosphortholog.com ) that accurately maps protein modification sites between different species. This facilitates the comparison of datasets derived from multiple species, and should be a valuable tool for the proteomics community. Results: Here we describe PhosphOrtholog, a web-based application for mapping known and novel orthologous PTM sites from experimental data obtained from different species. PhosphOrtholog is the only generic and automated tool that enables cross-species comparison of large-scale PTM datasets without relying on existing PTM databases. This is achieved through pairwise sequence alignment of orthologous protein residues. To demonstrate its utility we apply it to two sets of human and rat muscle phosphoproteomes generated following insulin and exercise stimulation, respectively, and one publicly available mouse phosphoproteome following cellular stress revealing high mapping and coverage efficiency. Although coverage statistics are dataset dependent, PhosphOrtholog increased the number of cross-species mapped sites in all our example data sets by more than double when compared to those recovered using existing resources such as PhosphoSitePlus. Conclusions: PhosphOrtholog is the first tool that enables mapping of thousands of novel and known protein phosphorylation sites across species, accessible through an easy-to-use web interface. Identification of conserved PTMs across species from large-scale experimental data increases our knowledgebase of functional PTM sites. Moreover, PhosphOrtholog is generic being applicable to other PTM datasets such as acetylation, ubiquitination and methylation. www.phosphortholog.com26283093
#Tool NameDescriptionURLReference
1PrePSPrePS stands for Prenylation Prediction Suite and combines three predictors for protein CaaX farnesylation, CaaX geranylgeranylation and Rab geranylgeranylation in one webinterface. The predictors aim to model the substrate-enzyme interactions based on refinement of the recognition motifs for each of the prenyltransferases. Motif information has been extracted from sets of known substrates (learning sets) and specific scoring functions have been created utilizing both sequence as well as physical property profiles including interpositional correlations to account for partially overlapping substrate specificities. The PrePS selectively assigns the modifying enzyme to predicted substrate proteins and sensitively filters out false positive predictions based on the methodology that has already been applied successfully for the prediction of GPI-anchors, myristoylation and PTS1 peroxisomal targeting. http://mendel.imp.univie.ac.at/sat/PrePS15960807
2SPrenylC-PseAACThe protein prenylation (or S-prenylation) is one of the most essential modifications, required for the association of membrane of a plethora of signalling proteins with the key biological process such as protein trafficking, cell growth, proliferation and differentiation. Due to the ubiquitous nature of S-prenylation and its role in cellular functions, any defect in the biosynthesis or regulation of the isoprenoid leads to the occurrence of a variety of diseases including neurodegenerative disorders, metabolic issues, cardiovascular diseases and one of the most fatal diseases, cancer. This depicts the strong biological significance of S-prenylation, thus, the timely and accurate identification of S-prenylation sites is crucial and may provide with possible ways to understand the mechanism of this modification in proteins. To avoid laborious, resource demanding and expensive experimental techniques of identifying S-prenylation sites, here, we propose a novel predictor namely SPrenylC-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. A 2-tier classification was performed i.e., at first level, identification of prenylation and non-prenylation sites is performed, while at the second level, identification of S-farnesylation and S-geranylgeranylation sites is performed. Using jackknife, perdition model validation gave 95.31% accuracy for tier-1 classification and 91.42% for tier 2 classification, while for 10-fold cross-validation, it gave 93.68% accuracy for tier-1 classification and 89.70% for tier 2 classification. Thus the proposed predictor can help in predicting the Prenylation sites in an efficient and accurate way. The SPrenylC-PseAAC is available at (biopred.org/prenyl). http://biopred.org/prenyl30768975
#Tool NameDescriptionURLReference
1PropPredLysine propionylation is an important and common protein acylation modification in both prokaryotes and eukaryotes. To better understand the molecular mechanism of propionylation, it is important to identify propionylated substrates and their corresponding propionylation sites accurately. In this study, a novel bioinformatics tool named PropPred is developed to predict propionylation sites by using multiple feature extraction and biased support vector machine. On the one hand, various features are incorporated, including amino acid composition, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs. And the F-score feature method and the incremental feature selection algorithm are adopted to remove the redundant features. On the other hand, the biased support vector machine algorithm is used to handle the imbalanced problem in propionylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of PropPred achieves a satisfactory performance with a Sensitivity of 70.03%, a Specificity of 75.61%, an accuracy of 75.02% and a Matthew's correlation coefficient of 0.3085. Feature analysis shows that some amino acid factors play the most important roles in the prediction of propionylation sites. These analysis and prediction results might provide some clues for understanding the molecular mechanisms of propionylation. A user-friendly web-server for PropPred is established at 123.206.31.171/PropPred/.123.206.31.171/PropPred/28763688
2Lysine propionylation is a newly discovered posttranslational modification (PTM) and plays a key role in the cellular process. Although proteomics techniques was capable of detecting propionylation, large-scale detection was still challenging. To bridge this gap, we presented a transfer learning-based method for computationally predicting propionylation sites. The recurrent neural network-based deep learning model was trained firstly by the malonylation and then fine-tuned by the propionylation. The trained model served as feature extractor where protein sequences as input were translated into numerical vectors. The support vector machine was used as the final classifier. The proposed method reached a matthews correlation coefficient (MCC) of 0.6615 on the 10-fold crossvalidation and 0.3174 on the independent test, outperforming state-of-the-art methods. The enrichment analysis indicated that the propionylation was associated with these GO terms (GO:0016620, GO:0051287, GO:0003735, GO:0006096, and GO:0005737) and with metabolism. We developed a user-friendly online tool for predicting propoinylation sites which is available at http://47.113.117.61/.http://47.113.117.61/33967828
#Tool NameDescriptionURLReference
1GPS-PUPThe Nobel Prize in Chemistry 2004 was award to Aaron Ciechanover, Avram Hershko and Irwin Rose for their discovery of ubiquitin-mediated protein degradation. (Vogel, G. et al., 2004). Numerous subsequent studies showed that the selective degradation by ubiquitination provided a critical mechanism in eukaryotes to regulate the cellular processes such as cell cycle and division, immune response and inflammation and signal transduction. Recently, prokaryotic ubiquitin-like protein (PUP) was identified as the tagging system in prokaryotes (Pearce, M. J. et al., 2008), which was coupled to its targets through deamidation by dop (PUP deamidase/depupylase) and following conjugation catalyzed by PafA (PUP--protein ligase) (Striebel, F. et al., 2009). Although the detail of pup-proteasome system needs further characterization, the discovery of degradation mechanism opens the door to investigate the dynamic protein regulation in Mycobacterium, which could be targeted by pathogen-specific drugs. (Salgame, P. et al., 2008). In this regards, experimental identification of pupylated substrates with their sites could provide fundamental insights to understanding the cellular processes in Mycobacterium.http://pup.biocuckoo.org/21850344
2PUP-FusePupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.http://kurata14.bio.kyutech.ac.jp/PUP-Fuse/33672741
3PupStructPost-translational modification (PTM) is a critical biological reaction which adds to the diversification of the proteome. With numerous known modifications being studied, pupylation has gained focus in the scientific community due to its significant role in regulating biological processes. The traditional experimental practice to detect pupylation sites proved to be expensive and requires a lot of time and resources. Thus, there have been many computational predictors developed to challenge this issue. However, performance is still limited. In this study, we propose another computational method, named PupStruct, which uses the structural information of amino acids with a radial basis kernel function Support Vector Machine (SVM) to predict pupylated lysine residues. We compared PupStruct with three state-of-the-art predictors from the literature where PupStruct has validated a significant improvement in performance over them with statistical metrics such as sensitivity (0.9234), specificity (0.9359), accuracy (0.9296), precision (0.9349), and Mathew's correlation coefficient (0.8616) on a benchmark dataset.33260770
#Tool NameDescriptionURLReference
1GSTPredGSTpred is a web-server specially trained for the Glutathione S-transferase protein.The prediction is based on the basis of amino acid composition, dipeptide composition, tripeptide composition by using support vector machines(SVM).The prediction result will be displayed on web browser in tabular form with score. Our model predict GSTs proteins with very high accuracy. During our study we may achieved accuracy 91.59% for peptide composition, 95.79% for dipeptide compostion and 97.66% for tripeptide composition model. We developed user friendly webserver where user can submit there sequence (directly paste sequence in box or upload sequence file) and select the option for simple composition, dipeptide composition, tripepttide composition and threshold. After some time result will dispalyed on the terminal in a tabular fom with name and score of each sequence. We also provide suplementary dataset and standalone version of GSTPred sotware without any charge. User can download this standalone software and our data set on local system.http://www.imtech.res.in/raghava/gstpred/17627599
2DeepGSHAs a widespread and reversible post-translational modification of proteins, S-glutathionylation specifically generates the mixed disulfides between cysteine residues and glutathione, which regulates various biological processes including oxidative stress, nitrosative stress and signal transduction. The identification of proteins and specific sites that undergo S-glutathionylation is crucial for understanding the underlying mechanisms and regulatory effects of S-glutathionylation. Experimental identification of S-glutathionylation sites is laborious and time-consuming, whereas computational predictions are more attractive due to their high speed and convenience. Here, we developed a novel computational framework DeepGSH (http://deepgsh.cancerbio.info/) for species-specific S-glutathionylation sites prediction, based on deep learning and particle swarm optimization algorithms. 5-fold cross validation indicated that DeepGSH was able to achieve an AUC of 0.8393 and 0.8458 for Homo sapiens and Mus musculus. According to critical evaluation and comparison, DeepGSH showed excellent robustness and better performance than existing tools in both species, demonstrating DeepGSH was suitable for S-glutathionylation prediction. The prediction results of DeepGSH might provide guidance for experimental validation of S-glutathionylation sites and helpful information to understand the intrinsic mechanisms.http://deepgsh.cancerbio.info/32234550
#Tool NameDescriptionURLReference
1GPS-SNOThe Nobel Prize in Chemistry 2004 was award to Aaron Ciechanover, Avram Hershko and Irwin Rose for their discovery of ubiquitin-mediated protein degradation. (Vogel, G. et al., 2004). Numerous subsequent studies showed that the selective degradation by ubiquitination provided a critical mechanism in eukaryotes to regulate the cellular processes such as cell cycle and division, immune response and inflammation and signal transduction. Recently, prokaryotic ubiquitin-like protein (PUP) was identified as the tagging system in prokaryotes (Pearce, M. J. et al., 2008), which was coupled to its targets through deamidation by dop (PUP deamidase/depupylase) and following conjugation catalyzed by PafA (PUP--protein ligase) (Striebel, F. et al., 2009). Although the detail of pup-proteasome system needs further characterization, the discovery of degradation mechanism opens the door to investigate the dynamic protein regulation in Mycobacterium, which could be targeted by pathogen-specific drugs. (Salgame, P. et al., 2008). In this regards, experimental identification of pupylated substrates with their sites could provide fundamental insights to understanding the cellular processes in Mycobacterium.http://sno.biocuckoo.org/20585580
2iSNO-AAPairThe web-server iSNO-AAPair is established for predicting the cysteine S-nitrosylation sites in proteins. Caveats: 1.To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input. A sequence with less than 50 amino acid residues is generally deemed as a fragment. 2.The accepted characters are: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, and the dummy code X. If the query sequence contains any illegal characters, the prediction will be stopped.http://app.aporc.org/iSNO-AAPair/index.html24109555
3iSNO-PseAACThe web-server iSNO-AAPair is established for predicting the cysteine S-nitrosylation sites in proteins. Caveats: 1.To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input. A sequence with less than 50 amino acid residues is generally deemed as a fragment. 2.The accepted characters are: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, and the dummy code X. If the query sequence contains any illegal characters, the prediction will be stopped.http://app.aporc.org/iSNO-PseAAC/23409062
4PSNOS-nitrosylation (SNO) is one of the most universal reversible post-translational modifications involved in many biological processes. Malfunction or dysregulation of SNO leads to a series of severe diseases, such as developmental abnormalities and various diseases. Therefore, the identification of SNO sites (SNOs) provides insights into disease progression and drug development. In this paper, a new bioinformatics tool, named PSNO, is proposed to identify SNOs from protein sequences. Firstly, we explore various promising sequence-derived discriminative features, including the evolutionary profile, the predicted secondary structure and the physicochemical properties. Secondly, rather than simply combining the features, which may bring about information redundancy and unwanted noise, we use the relative entropy selection and incremental feature selection approach to select the optimal feature subsets. Thirdly, we train our model by the technique of the k-nearest neighbor algorithm. Using both informative features and an elaborate feature selection scheme, our method, PSNO, achieves good prediction performance with a mean Mathews correlation coefficient (MCC) value of about 0.5119 on the training dataset using 10-fold cross-validation. These results indicate that PSNO can be used as a competitive predictor among the state-of-the-art SNOs prediction tools. A web-server, named PSNO, which implements the proposed method, is freely available at http://59.73.198.144:8088/PSNO/.http://59.73.198.144:8088/PSNO/24968264
#Tool NameDescriptionURLReference
1MDD-PalmS-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.http://csb.cse.yzu.edu.tw/MDDPalm/28662047
2SPalmitoylC-PseAACS-Palmitoylation is a uniquely reversible and biologically important post-translational modification as it plays an essential role in a variety of cellular processes including signal transduction, protein-membrane interactions, neuronal development, lipid raft targeting, subcellular localization and apoptosis. Due to its association with the neuronal development, it plays a pivotal role in a variety of neurodegenerative diseases, mainly Alzheimer's, Schizophrenia and Huntington's disease. It is also essential for developmental life cycles and pathogenesis of Toxoplasma gondii and Plasmodium falciparum, known to cause toxoplasmosis and malaria, respectively. This depicts the strong biological significance of S-Palmitoylation, thus, the timely and accurate identification of S-palmitoylation sites is crucial. Herein, we propose a predictor for S-Palmitoylation sites in proteins namely SPalmitoylC-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. Self-consistency testing and 10-fold cross-validation are performed to evaluate the performance of SPalmitoylC-PseAAC, using accuracy metrics. For self-consistency testing, 99.79% Acc, 99.77% Sp, 99.80% Sn and 1.00 MCC was observed, whereas, for 10-fold cross validation 97.22% Acc, 98.85% Sp, 95.80% Sn and 0.94 MCC was observed. Thus the proposed predictor can help in predicting the palmitoylation sites in an efficient and accurate way. The SPalmitoylC-PseAAC is available at (biopred.org/palm).http://biopred.org/palm30593778
#Tool NameDescriptionURLReference
1SVM-SulfoSiteProtein S-sulfenylation, which results from oxidation of free thiols on cysteine residues, has recently emerged as an important post-translational modification that regulates the structure and function of proteins involved in a variety of physiological and pathological processes. By altering the size and physiochemical properties of modified cysteine residues, sulfenylation can impact the cellular function of proteins in several different ways. Thus, the ability to rapidly and accurately identify putative sulfenylation sites in proteins will provide important insights into redox-dependent regulation of protein function in a variety of cellular contexts. Though bottom-up proteomic approaches, such as tandem mass spectrometry (MS/MS), provide a wealth of information about global changes in the sulfenylation state of proteins, MS/MS-based experiments are often labor-intensive, costly and technically challenging. Therefore, to complement existing proteomic approaches, researchers have developed a series of computational tools to identify putative sulfenylation sites on proteins. However, existing methods often suffer from low accuracy, specificity, and/or sensitivity. In this study, we developed SVM-SulfoSite, a novel sulfenylation prediction tool that uses support vector machines (SVM) to identify key determinants of sulfenylation among five feature classes: binary code, physiochemical properties, k-space amino acid pairs, amino acid composition and high-quality physiochemical indices. Using 10-fold cross-validation, SVM-SulfoSite achieved 95% sensitivity and 83% specificity, with an overall accuracy of 89% and Matthew's correlation coefficient (MCC) of 0.79. Likewise, using an independent test set of experimentally identified sulfenylation sites, our method achieved scores of 74%, 62%, 80% and 0.42 for accuracy, sensitivity, specificity and MCC, with an area under the receiver operator characteristic (ROC) curve of 0.81. Moreover, in side-by-side comparisons, SVM-SulfoSite performed as well as or better than existing sulfenylation prediction tools. Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein S-sulfenylation.30050050
2iSulf-CysCysteine S-sulfenylation is an important post-translational modification (PTM) in proteins, and provides redox regulation of protein functions. Bioinformatics and structural analyses indicated that S-sulfenylation could impact many biological and functional categories and had distinct structural features. However, major limitations for identifying cysteine S-sulfenylation were expensive and low-throughout. In view of this situation, the establishment of a useful computational method and the development of an efficient predictor are highly desired. In this study, a predictor iSulf-Cys which incorporated 14 kinds of physicochemical properties of amino acids was proposed. With the 10-fold cross-validation, the value of area under the curve (AUC) was 0.7155 ± 0.0085, MCC 0.3122 ± 0.0144 on the training dataset for 20 times. iSulf-Cys also showed satisfying performance in the independent testing dataset with AUC 0.7343 and MCC 0.3315. Features which were constructed from physicochemical properties and position were carefully analyzed. Meanwhile, a user-friendly web-server for iSulf-Cys is accessible at http://app.aporc.org/iSulf-Cys/.http://app.aporc.org/iSulf-Cys/27104833
#Tool NameDescriptionURLReference
1DeepCSOCysteine S-sulphenylation (CSO), as a novel post-translational modification (PTM), has emerged as a potential mechanism to regulate protein functions and affect signal networks. Because of its functional significance, several prediction approaches have been developed. Nevertheless, they are based on a limited dataset from Homo sapiens and there is a lack of prediction tools for the CSO sites of other species. Recently, this modification has been investigated at the proteomics scale for a few species and the number of identified CSO sites has significantly increased. Thus, it is essential to explore the characteristics of this modification across different species and construct prediction models with better performances based on the enlarged dataset. In this study, we constructed several classifiers and found that the long short-term memory model with the word-embedding encoding approach, dubbed LSTM WE , performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the receiver operating characteristic (ROC) curve for LSTM WE ranged from 0.82 to 0.85 for different organisms, which was superior to the reported CSO predictors. Moreover, we developed the general model based on the integrated data from different species and it showed great universality and effectiveness. We provided the on-line prediction service called DeepCSO that included both species-specific and general models, which is accessible through http://www.bioinfogo.org/DeepCSO.http://www.bioinfogo.org/DeepCSO33335901
2SIMLINBackground: S-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (-SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation. Results: In this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods. Conclusions: In summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.http://simlin.erc.monash.edu/ for academic purposes31752668
#Tool NameDescriptionURLReference
1iSuc-PseAACThe web-server iSNO-AAPair is established for predicting the cysteine S-nitrosylation sites in proteins. Caveats: 1.To obtain the predicted result with the anticipated success rate, the entire sequence of the query protein rather than its fragment should be used as an input. A sequence with less than 50 amino acid residues is generally deemed as a fragment. 2.The accepted characters are: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, and the dummy code X. If the query sequence contains any illegal characters, the prediction will be stopped.http://app.aporc.org/iSuc-PseAAC/26084794
2SSKM_SuccProtein succinylation is a type of post-translational modification that occurs on lysine sites and plays a key role in protein conformation regulation and cellular function control. When training, it is difficult to designate negative samples because of the uncertainty of non-succinylation lysine sites, and if not handled properly, it may affect the performance of computational models dramatically. Therefore, we propose a new semi-supervised learning method to identify reliable non-succinylation lysine sites as negative samples. This method, named SSKM_Succ, also employs K-means clustering to divide data into 5 clusters. Besides, information of proximal PTMs and three kinds of sequence features are utilized to formulate protein. Then, we performe a two-step feature selection to remove redundant features and construct the optimization model for each cluster. Finally, support vector machine is applied to construct a prediction model for each cluster. Meanwhile, we compare the result with other existing tools, and it shows that our method is promising for predicting succinylation sites. Through analysis, we further verify that succinylated protein has potential effects on amino acid degradation and fatty acid metabolism, and speculate that protein succinylation may be closely related to neurodegenerative diseases. The code of SSKM_Succ is available on the web https://github.com/yangyq505/SSKM_Succ.git.https://github.com/yangyq505/SSKM_Succ.git32750881
3HybridSuccAs an important protein acylation modification, lysine succinylation (Ksucc) is involved in diverse biological processes, and participates in human tumorigenesis. Here, we collected 26,243 non-redundant known Ksucc sites from 13 species as the benchmark data set, combined 10 types of informative features, and implemented a hybrid-learning architecture by integrating deep-learning and conventional machine-learning algorithms into a single framework. We constructed a new tool named HybridSucc, which achieved area under curve (AUC) values of 0.885 and 0.952 for general and human-specific prediction of Ksucc sites, respectively. In comparison, the accuracy of HybridSucc was 17.84%-50.62% better than that of other existing tools. Using HybridSucc, we conducted a proteome-wide prediction and prioritized 370 cancer mutations that change Ksucc states of 218 important proteins, including PKM2, SHMT2, and IDH2. We not only developed a high-profile tool for predicting Ksucc sites, but also generated useful candidates for further experimental consideration. The online service of HybridSucc can be freely accessed for academic research at http://hybridsucc.biocuckoo.org/.http://hybridsucc.biocuckoo.org/32861878
4CNN-SuccSiteSuccinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.http://csb.cse.yzu.edu.tw/CNN-SuccSite/31700141
5SuccSiteProtein succinylation is a biochemical reaction in which a succinyl group (-CO-CH2-CH2-CO-) is attached to the lysine residue of a protein molecule. Lysine succinylation plays important regulatory roles in living cells. However, studies in this field are limited by the difficulty in experimentally identifying the substrate site specificity of lysine succinylation. To facilitate this process, several tools have been proposed for the computational identification of succinylated lysine sites. In this study, we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition. Using experimentally verified lysine succinylated sites collected from public resources, the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo program. These findings enabled the adoption of an effective machine learning method, support vector machine, to train a predictive model with not only the amino acid composition, but also the composition of k-spaced amino acid pairs. After the selection of the best model using a ten-fold cross-validation approach, the selected model significantly outperformed existing tools based on an independent dataset manually extracted from published research articles. Finally, the selected model was used to develop a web-based tool, SuccSite, to aid the study of protein succinylation. Two proteins were used as case studies on the website to demonstrate the effective prediction of succinylation sites. We will regularly update SuccSite by integrating more experimental datasets. SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.http://csb.cse.yzu.edu.tw/SuccSite/32592791
6InspectorLysine succinylation is an important type of protein post-translational modification and plays a key role in regulating protein function and structural changes. The mechanism and function of succinylation have not been clarified. The key to better understanding the precise mechanism and functional role of succinylation is the identification of lysine succinylation sites. However, conventional experimental methods for succinylation identification are often expensive, time-consuming, and labor-intensive. Therefore, the new development of computational approaches to effectively identify lysine succinylation sites from sequence data is much needed. In this study, we proposed a novel predictor for lysine succinylation identification, Inspector, which was developed by using the random forest algorithm combined with a variety of sequence-based feature-encoding schemes. Edited nearest-neighbor undersampling method and adaptive synthetic oversampling approach were employed to solve dataset imbalance, and a two-step feature-selection strategy was applied to optimize the feature set for training the accuracy of the prediction model. Empirical studies on performance comparison with existing tools showed that Inspector was able to achieve competitive predictive performance for distinguishing lysine succinylation sites.31968210
7SuccessBackground: Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. Results: In this paper, we propose a novel computational predictor called 'Success', which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. Conclusions: The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection.29363424
8GPSucLysine succinylation is one of the dominant post-translational modification of the protein that contributes to many biological processes including cell cycle, growth and signal transduction pathways. Identification of succinylation sites is an important step for understanding the function of proteins. The complicated sequence patterns of protein succinylation revealed by proteomic studies highlight the necessity of developing effective species-specific in silico strategies for global prediction succinylation sites. Here we have developed the generic and nine species-specific succinylation site classifiers through aggregating multiple complementary features. We optimized the consecutive features using the Wilcoxon-rank feature selection scheme. The final feature vectors were trained by a random forest (RF) classifier. With an integration of RF scores via logistic regression, the resulting predictor termed GPSuc achieved better performance than other existing generic and species-specific succinylation site predictors. To reveal the mechanism of succinylation and assist hypothesis-driven experimental design, our predictor serves as a valuable resource. To provide a promising performance in large-scale datasets, a web application was developed at http://kurata14.bio.kyutech.ac.jp/GPSuc/.http://kurata14.bio.kyutech.ac.jp/GPSuc/30312302
9PSuccEBackground: Lysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. However, traditional methods, experimental approaches, are labor-intensive and time-consuming. Computational prediction methods have been proposed recent years, and they are popular because of their convenience and high speed. In this study, we developed a new method to predict succinylation sites in protein combining multiple features, including amino acid composition, binary encoding, physicochemical property and grey pseudo amino acid composition, with a feature selection scheme (information gain). And then, it was trained using SVM (Support Vector Machine) and an ensemble learning algorithm. Results: The performance of this method was measured with an accuracy of 89.14% and a MCC (Matthew Correlation Coefficient) of 0.79 using 10-fold cross validation on training dataset and an accuracy of 84.5% and a MCC of 0.2 on independent dataset. Conclusions: The conclusions made from this study can help to understand more of the succinylation mechanism. These results suggest that our method was very promising for predicting succinylation sites. The source code and data of this paper are freely available at https://github.com/ningq669/PSuccE .https://github.com/ningq669/PSuccE29940836
10IFS-LightGBMSuccinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.33224267
11DeepSuccinylSiteBackground: Protein succinylation has recently emerged as an important and common post-translation modification (PTM) that occurs on lysine residues. Succinylation is notable both in its size (e.g., at 100 Da, it is one of the larger chemical PTMs) and in its ability to modify the net charge of the modified lysine residue from + 1 to - 1 at physiological pH. The gross local changes that occur in proteins upon succinylation have been shown to correspond with changes in gene activity and to be perturbed by defects in the citric acid cycle. These observations, together with the fact that succinate is generated as a metabolic intermediate during cellular respiration, have led to suggestions that protein succinylation may play a role in the interaction between cellular metabolism and important cellular functions. For instance, succinylation likely represents an important aspect of genomic regulation and repair and may have important consequences in the etiology of a number of disease states. In this study, we developed DeepSuccinylSite, a novel prediction tool that uses deep learning methodology along with embedding to identify succinylation sites in proteins based on their primary structure. Results: Using an independent test set of experimentally identified succinylation sites, our method achieved efficiency scores of 79%, 68.7% and 0.48 for sensitivity, specificity and MCC respectively, with an area under the receiver operator characteristic (ROC) curve of 0.8. In side-by-side comparisons with previously described succinylation predictors, DeepSuccinylSite represents a significant improvement in overall accuracy for prediction of succinylation sites. Conclusion: Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein succinylation.32321437
12SuccinSiteLysine succinylation is an emerging protein post-translational modification, which plays an important role in regulating the cellular processes in both eukaryotic and prokaryotic cells. However, the succinylation modification site is particularly difficult to detect because the experimental technologies used are often time-consuming and costly. Thus, an accurate computational method for predicting succinylation sites may help researchers towards designing their experiments and to understand the molecular mechanism of succinylation. In this study, a novel computational tool termed SuccinSite has been developed to predict protein succinylation sites by incorporating three sequence encodings, i.e., k-spaced amino acid pairs, binary and amino acid index properties. Then, the random forest classifier was trained with these encodings to build the predictor. The SuccinSite predictor achieves an AUC score of 0.802 in the 5-fold cross-validation set and performs significantly better than existing predictors on a comprehensive independent test set. Furthermore, informative features and predominant rules (i.e. feature combinations) were extracted from the trained random forest model for an improved interpretation of the predictor. Finally, we also compiled a database covering 4411 experimentally verified succinylation proteins with 12 456 lysine succinylation sites. Taken together, these results suggest that SuccinSite would be a helpful computational resource for succinylation sites prediction. The web-server, datasets, source code and database are freely available at http://systbio.cau.edu.cn/SuccinSite/http://systbio.cau.edu.cn/SuccinSite/26739209
#Tool NameDescriptionURLReference
1GPS-SUMOThe Nobel Prize in Chemistry 2004 was award to Aaron Ciechanover, Avram Hershko and Irwin Rose for their discovery of ubiquitin-mediated protein degradation. (Vogel, G. et al., 2004). Numerous subsequent studies showed that the selective degradation by ubiquitination provided a critical mechanism in eukaryotes to regulate the cellular processes such as cell cycle and division, immune response and inflammation and signal transduction. Recently, prokaryotic ubiquitin-like protein (PUP) was identified as the tagging system in prokaryotes (Pearce, M. J. et al., 2008), which was coupled to its targets through deamidation by dop (PUP deamidase/depupylase) and following conjugation catalyzed by PafA (PUP--protein ligase) (Striebel, F. et al., 2009). Although the detail of pup-proteasome system needs further characterization, the discovery of degradation mechanism opens the door to investigate the dynamic protein regulation in Mycobacterium, which could be targeted by pathogen-specific drugs. (Salgame, P. et al., 2008). In this regards, experimental identification of pupylated substrates with their sites could provide fundamental insights to understanding the cellular processes in Mycobacterium.http://sumosp.biocuckoo.org/24880689
2JASSASUMOylation is a post-translational modification conserved from yeast to human that modulates several fundamental cellular processes, and have been showed to be involved in human disorders. It consists of the covalent attachment of a small ubiquitin-related modifier protein (SUMO) to a target protein by a mechanism similar to that of ubiquitination. (Melchior, 2000; Geiss-Friedlander and Melchior, 2007). There are at least three SUMO isoforms (SUMO1,2,3) in mammalian cells but only one in Saccharomyces cerevisiae (Smt3). This post-translational modification involves a cascade of SUMO-specific enzymes (for a review, see (Johnson, 2004; Kerscher et al., 2006; Martin et al., 2007)). First, SUMO is proteolytically cleaved to expose the internal diglycine motif required for conjugation. Then SUMO is activated in an ATP-dependent manner by the heterodimeric SUMO activating enzyme SAE1/SAE2. SUMO is transferred to the conjugating enzyme Ubc9 and is conjugated to the target substrate protein. This process can be enhanced by involvement of growing number of E3 ligases. SUMO peptides is reversibly covalent conjugated onto an acceptor lysine residue (K) of the substrate which often lies within the consensus sequence ?KxE (where ? is an hydrophobic residue and x any amino acid; Melchior, 2000; Rodriguez et al., 2001). It should be noted that inverted SUMOylation motif ([E/D]xK?) have been reported few years ago (Matic et al., 2010) (Table 1). Extended SUMO consensus motifs, like consensus inverted, PDSM (phosphorylation-dependent SUMOylation motif) (Gregoire et al., 2006; Hietakangas et al., 2006; Shalizi et al., 2006), NDSM (negatively charged amino acid-dependent SUMOylation site) (Yang et al., 2006), SUMO-acetyl switch (Stankovic-Valentin et al., 2007) and HCSM (hydrophobic cluster SUMOylation motif) (Matic et al., 2010; Hietakangas et al., 2006) also have been reported (for a review, see (Martin et al., 2007)). Moreover, elements flanking the core motif (such as acidic, phosphorylatable or proline residues) may impact on SUMO conjugation (Table 1) (Gareau and Lima, 2011). Notably, about 25% of experimentally validated SUMOylated sites do not match with any of these motifs. Additionally, not all sites that adhere to the consensus are modified, likely because SUMO is conjugated only to residues appropriately presented to the SUMOylation machinery. SUMO can also interact non-covalently with proteins harboring SUMO-interacting motifs (SIM), also known as SUMO-binding domains (SBDs) or motifs (SBMs), typically consisting of a hydrophobic core (Kerscher, 2007) (Table 1). Negatively charged patches of residues flanking the SIM may contribute to the orientation and/or the isoforme-specificity binding (Hannich et al., 2005; Hecker et al., 2006) and an implication of the phosphorylation near the SIM, named phosphoSIM, have been reported (Stehmeier and Muller, 2009).http://www.jassa.fr/26142185
3SUMOhydroSUMOhydro has been developed to predict sumoylation lysine (K) sites in proteins by introduction of hydrophobicity to binary encoding. With the assistance of Support Vector Machine (SVM)(http://svmlight.joachims.org/), the predictor was trained and tested in a new and stringent sumoylation sites dataset. The proposed SUMOhydro has been proved to be more powerful than the traditional methods which constructed the prediction model based on all the sumoylation sites. when compared with two existing predictors, it can serve as a competitive method in predicting sumoylation sites.http://protein.cau.edu.cn/others/SUMOhydro/introduction.html22720073
4SUMOplotThe SUMOplot Analysis Program predicts and scores sumoylation sites in your protein. The presence of this post-translational modification may help explain larger MWs than expected on SDS gels due to attachment of SUMO protein (11kDa) at multiple positions in your protein. SUMO-1 (small ubiquitin-related modifier; also known as PIC1, UBL1, Sentrin, GMP1, and Smt3) is a member of the ubiquitin and ubiquitin-like superfamily. Most SUMO-modified proteins contain the tetrapeptide motif B-K-x-D/E where B is a hydrophobic residue, K is the lysine conjugated to SUMO, x is any amino acid (aa), D or E is an acidic residue. The SUMOplot Analysis Program predicts the probability for the SUMO consensus sequence (SUMO-CS) to be engaged in SUMO attachment. The SUMOplot score system is based on two criteria: direct amino acid match to SUMO-CS. substitution of the consensus amino acid residues with amino acid residues exhibiting similar hydrophobicity.http://www.abgent.com/sumoplot
5SUMOgoMost modern tools used to predict sites of small ubiquitin-like modifier (SUMO) binding (referred to as SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these tools rarely consider the influence of post-translational modification (PTM) information for other sites within the same protein on the accuracy of prediction results. This study applied the Random Forest machine learning method, as well as motif screening models and a feature selection combination mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to prediction method, PTM sites were coded as new functional features in addition to structural features, such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary structure information that is important for PTM. Twenty cycles of prediction were conducted with a 1:1 combination of positive test data and random negative data. Matthew's correlation coefficient of SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further verified the important role of PTM in SUMOgo and includes a case study on CREB binding protein (CREBBP). The website for the final tool is http://predictor.nchu.edu.tw/SUMOgo .http://predictor.nchu.edu.tw/SUMOgo30341374
6HseSUMOBackground: Post-translational modifications are viewed as an important mechanism for controlling protein function and are believed to be involved in multiple important diseases. However, their profiling using laboratory-based techniques remain challenging. Therefore, making the development of accurate computational methods to predict post-translational modifications is particularly important for making progress in this area of research. Results: This work explores the use of four half-sphere exposure-based features for computational prediction of sumoylation sites. Unlike most of the previously proposed approaches, which focused on patterns of amino acid co-occurrence, we were able to demonstrate that protein structural based features could be sufficiently informative to achieve good predictive performance. The evaluation of our method has demonstrated high sensitivity (0.9), accuracy (0.89) and Matthew's correlation coefficient (0.78-0.79). We have compared these results to the recently released pSumo-CD method and were able to demonstrate better performance of our method on the same evaluation dataset. Conclusions: The proposed predictor HseSUMO uses half-sphere exposures of amino acids to predict sumoylation sites. It has shown promising results on a benchmark dataset when compared with the state-of-the-art method. The extracted data of this study can be accessed at https://github.com/YosvanyLopez/HseSUMO .https://github.com/YosvanyLopez/HseSUMO .30999862
7SumSecPost Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.https://github.com/YosvanyLopez/SumSec.30544729
#Tool NameDescriptionURLReference
1pNitro-Tyr-PseAACBackground: Closely related to causes of various diseases such as rheumatoid arthritis, septic shock, and coeliac disease; tyrosine nitration is considered as one of the most important post-translational modification in proteins. Inside a cell, protein modifications occur accurately by the action of sophisticated cellular machinery. Specific enzymes present in endoplasmic reticulum accomplish this task. The identification of potential tyrosine residues in a protein primary sequence, which can be nitrated, is a challenging task. Methods: To counter the prevailing, laborious and time-consuming experimental approaches, a novel computational model is introduced in the present study. Based on data collected from experimentally verified tyrosine nitration sites feature vectors are formed. Later, an adaptive training algorithm is used to train a back propagation neural network for prediction purposes. To objectively measure the accuracy of the proposed model, rigorous verification and validation tests are carried out. Results: Through verification and validation, a promising accuracy of 88%, a sensitivity of 85%, a specificity of 89.18% and Mathew's Correlation Coefficient of 0.627 is achieved. Conclusion: It is concluded that the proposed computational model provides the foundation for further investigation and be used for the identification of nitrotyrosine sites in proteins.30479209
#Tool NameDescriptionURLReference
1The SulfinatorThe Sulfinator is a software tool able to predict tyrosine sulfation sites in protein sequences. It employs four different Hidden Markov Models that were built to recognise sulfated tyrosine residues located N-terminally, within sequence windows of more than 25 amino acids and C-terminally, as well as sulfated tyrosines clustered within 25 amino acid windows, respectively. All four HMMs contain the distilled information from one multiple sequence alignment. The data sets used to train and test the HMM are available.http://web.expasy.org/sulfinator/12050077
2iSulfoTyr-PseAACBackground: The amino acid residues, in protein, undergo post-translation modification (PTM) during protein synthesis, a process of chemical and physical change in an amino acid that in turn alters behavioral properties of proteins. Tyrosine sulfation is a ubiquitous posttranslational modification which is known to be associated with regulation of various biological functions and pathological pro-cesses. Thus its identification is necessary to understand its mechanism. Experimental determination through site-directed mutagenesis and high throughput mass spectrometry is a costly and time taking process, thus, the reliable computational model is required for identification of sulfotyrosine sites. Methodology: In this paper, we present a computational model for the prediction of the sulfotyrosine sites named iSulfoTyr-PseAAC in which feature vectors are constructed using statistical moments of protein amino acid sequences and various position/composition relative features. These features are in-corporated into PseAAC. The model is validated by jackknife, cross-validation, self-consistency and in-dependent testing. Results: Accuracy determined through validation was 93.93% for jackknife test, 95.16% for cross-validation, 94.3% for self-consistency and 94.3% for independent testing. Conclusion: The proposed model has better performance as compared to the existing predictors, how-ever, the accuracy can be improved further, in future, due to increasing number of sulfotyrosine sites in proteins.32030089
#Tool NameDescriptionURLReference
1hCKSAAP_UbSiteCKSAAP_UbSite is a web server that could predict ubiquitination sites in proteins. With the assistance of SVM, the highlight of CKSAAP_UbSite is to employ the composition of k-spaced amino acid pairs as input feature vector. It was trained and tested on a set of experimentally verified ubiquitination sites obtained from Radivojac et al. (Proteins, 2010, 78: 365-380). The class-balanced accuracy and MCC of CKSAAP_UbSite reached 73.40% and 0.4694, respectively. Since Radivojac et al.' s dataset was selected from the proteome of S. cerevisiae, the application of CKSAAP_UbSite should be favorable in the proteome of S. cerevisiae.http://protein.cau.edu.cn/cksaap_ubsite/23603789
2iUbiq-LysThe web-server iUbiq-Lys is a web server that could predict ubiquitination sites in proteins. With the assistance of SVM, the highlight of iUbiq-Lys is to employ amino acid sequence features extracted from the sequence evolution information via grey system model (Grey-PSSM).http://www.jci-bioinfo.cn/iUbiq-Lys25248923
3UbiPredBACKGROUND: Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method to identify promising ubiquitylation sites. RESULTS: We established an ubiquitylation dataset consisting of 157 ubiquitylation sites and 3676 putative non-ubiquitylation sites extracted from 105 proteins in the UbiProt database. This study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, k-nearest neighbor, and NaiveBayes). Results show that the set of used 531 physicochemical properties and support vector machine (SVM) are the best kind of features and classifier respectively that their combination has a prediction accuracy of 72.19% using leave-one-out cross-validation.Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. To further analyze the informative physicochemical properties, a decision tree method C5.0 was used to acquire if-then rule-based knowledge of predicting ubiquitylation sites. UbiPred can screen promising ubiquitylation sites from putative non-ubiquitylation sites using prediction scores. By applying UbiPred, 23 promising ubiquitylation sites were identified from an independent dataset of 3424 putative non-ubiquitylation sites, which were also validated by using the obtained prediction rules. CONCLUSION: We have proposed an algorithm IPMA for mining informative physicochemical properties from protein sequences to build an SVM-based prediction system UbiPred. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification. UbiPred has been implemented as a web server and is available at http://iclab.life.nctu.edu.tw/ubipred.http://iclab.life.nctu.edu.tw/ubipred18625080
4UbiProberSystematic dissection of the ubiquitylation proteome is emerging as an appealing but challenging research topic because of the significant roles ubiquitylation plays not only in protein degradation but also in many other cellular functions. Since ubiquitylation is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitylation sites using conventional experimental approaches. To efficiently discover lysine-ubiquitylation sites, a highly specific predictor for in silico prediction of ubiquitylation sites in any individual organism is urgently needed to guide experimental design. Here we present a novel protein ubiquitylation prediction tool named UbiProber, implemented by support vector machines that integrates local sequence similarities to known ubiquitylation sites, physicochemical property and amino acid compositions, and we used the information gain to identify the key positions and amino acids to optimize the prediction model. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, but the cross-validation result indicates that the integration of key positions and amino acids features of ubiquitylation sequences can improve predictive performance. UbiProber offers four models of Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Combined, an independent test on a 1:1 ratio of positive and negative samples revealed that the areas under ROC curves (AUCs) of Combined model reached 83.36%. Cross-validation tests also show that UbiProber achieves some improvement over existing tools in predicting species-specific ubiquitylation sites.http://bioinfo.ncu.edu.cn/UbiProber.aspx23626001
5UbPredUbPred is a random forest-based predictor of potential ubiquitination sites in proteins. It was trained on a combined set of 266 non-redundant experimentally verified ubiquitination sites available from our experiments and from two large-scale proteomics studies (Hitchcock, et al., 2003; Peng, et al., 2003). Class-balanced accuracy of UbPred reached 72%, whereas the AUC (area under the ROC curve) was estimated to be ~80%.http://www.ubpred.org/19722269
6DeepUbiBackground: Protein ubiquitination occurs when the ubiquitin protein binds to a target protein residue of lysine (K), and it is an important regulator of many cellular functions, such as signal transduction, cell division, and immune reactions, in eukaryotes. Experimental and clinical studies have shown that ubiquitination plays a key role in several human diseases, and recent advances in proteomic technology have spurred interest in identifying ubiquitination sites. However, most current computing tools for predicting target sites are based on small-scale data and shallow machine learning algorithms. Results: As more experimentally validated ubiquitination sites emerge, we need to design a predictor that can identify lysine ubiquitination sites in large-scale proteome data. In this work, we propose a deep learning predictor, DeepUbi, based on convolutional neural networks. Four different features are adopted from the sequences and physicochemical properties. In a 10-fold cross validation, DeepUbi obtains an AUC (area under the Receiver Operating Characteristic curve) of 0.9, and the accuracy, sensitivity and specificity exceeded 85%. The more comprehensive indicator, MCC, reaches 0.78. We also develop a software package that can be freely downloaded from https://github.com/Sunmile/DeepUbi .https://github.com/Sunmile/DeepUbi30777029
7Protein ubiquitylation is an important posttranslational modification (PTM), which is involved in diverse biological processes and plays an essential role in the regulation of physiological mechanisms and diseases. The Protein Lysine Modifications Database (PLMD) has accumulated abundant ubiquitylated proteins with their substrate sites for more than 20 kinds of species. Numerous works have consequently developed a variety of ubiquitylation site prediction tools across all species, mainly relying on the predefined sequence features and machine learning algorithms. However, the difference in ubiquitylated patterns between these species stays unclear. In this work, the sequence-based characterization of ubiquitylated substrate sites has revealed remarkable differences among plants, animals, and fungi. Then an improved word-embedding scheme based on the transfer learning strategy was incorporated with the multilayer convolutional neural network (CNN) for identifying protein ubiquitylation sites. For the prediction of plant ubiquitylation sites, the proposed deep learning scheme could outperform the machine learning-based methods, with the accuracy of 75.6%, precision of 73.3%, recall of 76.7%, F-score of 0.7493, and 0.82 AUC on the independent testing set. Although the ubiquitylated specificity of substrate sites is complicated, this work has demonstrated that the application of the word-embedding method can enable the extraction of informative features and help the identification of ubiquitylated sites. To accelerate the investigation of protein ubiquitylation, the data sets and source code used in this study are freely available at https://github.com/wang-hong-fei/DL-plant-ubsites-prediction.https://github.com/wang-hong-fei/DL-plant-ubsites-prediction33102477

TOP