PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Identification of antioxidant proteins using a discriminative intelligent model of k-space amino acid pairs based descriptors incorporating with ensemble feature selection

Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Antioxidant proteins have been discovered closely associated with disease control due to its capability to eradicate excess free radicals. The accurate identification of antioxidant proteins is on the upsurge owing to their therapeutic significance. However, observing the rapid increases of this toxic disease in the human body, several machine learning algorithms have been applied and performed inadequately to identify antioxidant proteins. Therefore, measuring the effectiveness of antioxidant proteins on the human body, a reliable intelligent model is indispensable for the researchers. In this study, primary protein sequences are formulated using evolutionary and sequence-based numerical descriptors. Whereas, evolutionary features are collected using a bigram Position-specific scoring matrix, besides, K-space amino acid pair (KSAAP) and dipeptide composition are utilized to extract sequential information. Furthermore, in order to reduce the computational time and to eradicate irreverent and noisy features, the Sequential forward selection and Support vector machine (SFS-SVM) based ensemble approach is applied to select optimal features. At last, several distinct nature classification learning methods are applied to choose a suitable operational engine for our model. After evaluating the empirical results, SVM using optimal features achieved an accuracy of 97.54%, 93.71% using the training and independent dataset, respectively. It was found that our proposed model outperformed and reported the highest performance than the existing computational models. It is expected that the developed model may be played a useful role in research academia as well as proteomics and drug development. The source code and all datasets are publicly available at https://github.com/salman-khan-mrd/Antioxident_proteins.
Twórcy
autor
  • Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
autor
  • Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
  • Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan
autor
  • School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
autor
  • Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
  • Department of Physics, University of Lahore, Sargodha, Pakistan
Bibliografia
  • [1] Sies H. Oxidative stress: oxidants and antioxidants. Experimental Physiology: Translation and Integration. 1997;82:291–5.
  • [2] Cadenas E, Davies KJ. Mitochondrial free radical generation, oxidative stress, and aging. Free Radic Biol Med 2000;29:222–30.
  • [3] Feng P, Chen W, Lin H. Identifying antioxidant proteins by using optimal dipeptide compositions. Interdiscip Sci 2016;8:186–91.
  • [4] Maxwell S. Coronary artery disease-free radical damage, antioxidant protection, and the role of homocysteine. Basic Res Cardiol 2000;95:I65–71.
  • [5] Dreher D, Junod AF. Role of oxygen free radicals in cancer development. Eur J Cancer 1996;32:30–8.
  • [6] Yildirim Z, Ucgun NI, Yildirim F. The role of oxidative stress and antioxidants in the pathogenesis of age-related macular degeneration. Clinics 2011;66:743–6.
  • [7] Behl C, Moosmann B. Antioxidant neuroprotection in Alzheimer’s disease as a preventive and therapeutic approach. Free Radic Biol Med 2002;33:182–91.
  • [8] Bailey D, Evans K, James P, McEneny J, Young I, Fall L, et al. Altered free radical metabolism in acute mountain sickness: implications for dynamic cerebral autoregulation and blood-brain barrier function. J Physiol (Lond) 2009;587:73–85.
  • [9] Feng P, Feng L. Recent advances on antioxidant identification based on machine learning methods. Curr Drug Metab 2020.
  • [10] Feng P-M, Lin H, Chen W. Identification of antioxidants from sequence information using naive Bayes. Comput Math Methods Med 2013;2013.
  • [11] Zhang L, Zhang C, Gao R, Yang R. Incorporating g-gap dipeptide composition and position specific scoring matrix for identifying antioxidant proteins. In: 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE): IEEE. p. 31–6.
  • [12] Fernández-Blanco E, Aguiar-Pulido V, Munteanu CR, Dorado J. Random Forest classification based on star graph topological indices for antioxidant proteins. J Theor Biol 2013;317:331–7.
  • [13] Xu L, Liang G, Shi S, Liao C. SeqSVM: a sequence-based support vector machine method for identifying antioxidant proteins. Int J Mol Sci 2018;19:1773.
  • [14] Butt AH, Rasool N, Khan YD. Prediction of antioxidant proteins by incorporating statistical moments based features into Chou’s PseAAC. J Theor Biol 2019;473:1–8.
  • [15] Li X, Tang Q, Tang H, Chen W. Identifying antioxidant proteins by combining multiple methods. Front Bioeng Biotechnol 2020;8:858.
  • [16] Shao L, Gao H, Liu Z, Feng J, Tang L, Lin H. Identification of antioxidant proteins with deep learning from sequence information. Front Pharmacol 2018;9:1036.
  • [17] Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. Uniprotkb/swiss-prot. Plant bioinformatics. Springer; 2007. p. 89–112.
  • [18] Fu L, Niu B, Zhu Z, Wu S, CD-HIT Li W. Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012;28:3150–2.
  • [19] Ju Z, Wang S-Y. Prediction of lysine formylation sites using the composition of k-spaced amino acid pairs via Chou’s 5-steps rule and general pseudo components. Genomics 2019.
  • [20] Hasan M, Khatun M, Mollah M, Haque N, Yong C, Dianjing G. NTyroSite: computational identification of protein nitrotyrosine sites using sequence evolutionary features. Molecules 2018;23:1667.
  • [21] Fu H, Yang Y, Wang X, Wang H, Xu Y. DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins. BMC Bioinformatics 2019;20:86.
  • [22] Hasan MM, Zhou Y, Lu X, Li J, Song J, Zhang Z. Computational identification of protein pupylation sites by using profile-based composition of k-spaced amino acid pairs. PLoS One 2015;10 e0129635.
  • [23] Ju Z, Cao J-Z. Prediction of protein N-formylation using the composition of k-spaced amino acid pairs. Anal Biochem 2017;534:40–5.
  • [24] Song J, Wang Y, Li F, Akutsu T, Rawlings ND, Webb GI, et al. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief Bioinformatics 2018;20:638–58.
  • [25] Wei L, Bowen Z, Zhiyong C, Gao X, Liao M. Exploring local discriminative information from evolutionary profiles for cytokine-receptor interaction prediction. Neurocomputing 2016;217:37–45.
  • [26] Ali F, Kabir M, Arif M, Swati ZNK, Khan ZU, Ullah M, et al. DBPPred-PDSD: machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space. Chemom Intell Lab Syst 2018;182:21–30.
  • [27] Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. A segmentation-based method to extract structural and evolutionary features for protein fold recognition. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2014;11:510–9.
  • [28] Kabir M, Arif M, Ali F, Ahmad S, Swati ZNK, Yu D-J. Prediction of membrane protein types by exploring local discriminative information from evolutionary profiles. Anal Biochem 2019;564:123–32.
  • [29] Waris M, Ahmad K, Kabir M, Hayat M. Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. Neurocomputing 2016;199:154–62.
  • [30] Ali F, Hayat M. Classification of membrane protein types using voting feature interval in combination with Chou׳s pseudo amino acid composition. J Theor Biol 2015;384:78–83.
  • [31] Ali F, Hayat M. Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space. J Theor Biol 2016;403:30–7.
  • [32] Tang H, Zhao Y-W, Zou P, Zhang C-M, Chen R, Huang P, et al. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 2018;14:957.
  • [33] Wang S, Zhang Y-H, Lu J, Cui W, Hu J, Cai Y-D. Analysis and identification of aptamer-compound interactions with a maximum relevance minimum redundancy and nearest neighbor algorithm. Biomed Res Int 2016;2016.
  • [34] Akbar S, Rahman AU, Hayat M, Sohail M. cACP: classifying anticancer peptides using discriminative intelligent model via Chou’s 5-step rules and general pseudo components. Chemom Intell Lab Syst 2020;196 103912.
  • [35] Akbar S, Khan S, Ali F, Hayat M, Qasim M, Gul S. iHBPDeepPSSM: identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach. Chemom Intell Lab Syst 2020;204 104103.
  • [36] Cheng F, Zhou Y, Li W, Liu G, Tang Y. Prediction of chemical-protein interactions network with weighted network-based inference method. PLoS One 2012;7 e41064.
  • [37] Yang R, Zhang C, Zhang L, Gao R. A two-step feature selection method to predict Cancerlectins by Multiview features and synthetic minority oversampling technique. Biomed Res Int 2018;2018.
  • [38] Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. bioinformatics 2007;23:2507–17.
  • [39] Breiman L. Random forests. Mach Learn 2001;45:5–32.
  • [40] Jo T, Cheng J. Improving protein fold recognition by random forest. BMC bioinformatics: BioMed Central 2014:S14.
  • [41] Li J, Wu J, Chen K. PFP-RFSM: protein fold prediction by using random forests and sequence motifs. J Biomed Sci Eng 2013;6:1161.
  • [42] Ma X, Guo J, Sun X. DNABP: identification of DNA-binding proteins based on feature selection using a random forest and predicting binding residues. PLoS One 2016;11 e0167345.
  • [43] Hayat M, Khan A, Yeasin M. Prediction of membrane proteins using split amino acid and ensemble classification. Amino Acids 2012;42:2447–60.
  • [44] Sabooh MF, Iqbal N, Khan M, Khan M, Maqbool H. Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC. J Theor Biol 2018;452:1–9.
  • [45] Akbar S, Hayat M, Iqbal M, Tahir M. iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition. Front Comput Sci 2020;14:451–60.
  • [46] Ali F, Ahmed S, Swati ZNK, Akbar S. DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information. J Comput Aided Mol Des 2019;33:645–58.
  • [47] Kabir M, Hayat M. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol Genet Genom 2016;291:285–96.
  • [48] Ahmed S, Kabir M, Arif M, Ali Z, Ali F, Swati ZNK. Improving secretory proteins prediction in Mycobacterium tuberculosis using the unbiased dipeptide composition with support vector machine. Int J Data Min Bioinform 2018;21:212–29.
  • [49] Ali F, Arif M, Khan ZU, Kabir M, Ahmed S, Yu D-J. SDBP-Pred: prediction of single-stranded and double-stranded DNAbinding proteins by extending consensus sequence and K-segmentation strategies into PSSM. Anal Biochem 2020;589113494.
  • [50] Akbar S, Hayat M. iMethyl-STTNC: identification of N6-methyladenosine sites by extending the idea of SAAC into Chou’s PseAAC to formulate RNA sequences. J Theor Biol 2018;455:205–11.
  • [51] Akbar S, Hayat M, Tahir M, Chong KT. cACP-2LFS: classification of anticancer peptides using sequential discriminative model of KSAAP and two-level feature selection approach. IEEE Access 2020;8:131939–48.
  • [52] Specht DF. Probabilistic neural networks. Neural Network. 1990;3:109–18.
  • [53] Akbar S, Hayat M, Kabir M, Iqbal M. iAFP-gap-SMOTE: an efficient feature extraction scheme gapped dipeptide composition is coupled with an oversampling technique for identification of antifreeze proteins. Lett Org Chem 2019;16:294–302.
  • [54] Sridhar D, Krishna IM. Brain tumor classification using discrete cosine transform and probabilistic neural network. In: 2013 International Conference on Signal Processing, Image Processing & Pattern Recognition: IEEE. p. 92–6.
  • [55] Huang CJ, Liao WC. Application of probabilistic neural networks to the class prediction of leukemia and embryonal tumor of central nervous system. Neural Process Lett 2004;19:211–26.
  • [56] Paliwal M, Kumar UA. Neural networks and statistical techniques: a review of applications. Expert Syst Appl 2009;36:2–17.
  • [57] Hu J, Yan X. BS-KNN: an effective algorithm for predicting protein subchloroplast localization. Evol Bioinform 2012;8. EBO. S8681.
  • [58] Lan L, Djuric N, Guo Y, Vucetic S. MS-k NN: protein function prediction by integrating multiple data sources. BMC bioinformatics: Springer; 2013. p. S8.
  • [59] Chang J-Y, Shyu J-J, Shi Y-X. Fuzzy K-nearest neighbor classifier to predict protein solvent accessibility. International Conference on Neural Information Processing: Springer; 2007. p. 837–45.
  • [60] Dwivedi AK. Performance evaluation of different machine learning techniques for prediction of heart disease. Neural Comput Appl 2018;29:685–93.
  • [61] Baratloo A, Hosseini M, Negida A, El Ashal G. Part 1: simple definition and calculation of accuracy, sensitivity and specificity, 2015.
  • [62] Akbar S, Hayat M, Iqbal M, Jan MA. iACP-GAEnsC: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artif Intell Med 2017;79:62–70.
  • [63] Ali F, Ahmed S, Swati ZNK, Akbar S. DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information. J Comput Aided Mol Des 2019:1–14.
  • [64] Zhang L, Zhang C, Gao R, Yang R, Song Q. Sequence based prediction of antioxidant proteins using a classifier selection strategy. PLoS One 2016;11.
  • [65] Cheng X, Zhao S-G, Lin W-Z, Xiao X, Chou K-C. pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics 2017;33:3524–31.
  • [66] Xiao X, Chen X, Chen G, Mao Q, Chou K-C. pLoc_bal-mGpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics 2019;111:886–92.
  • [67] Chou K-C. Impacts of bioinformatics to medicinal chemistry. Med Chem (Los Angeles) 2015;11:218–34.
  • [68] Chou K-C. Advances in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs. Curr Med Chem 2019;26:4918–43.
  • [69] Chou K-C. An unprecedented revolution in medicinal chemistry driven by the progress of biological science. Curr Top Med Chem 2017;17:2337–58.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-f6a169b7-e433-44d2-aa6f-121085c19d82
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.