Integrated statistical and rule-mining techniques for dna methylation and gene expression data analysis

Mallik, S.; Mukhopadhyay, A.; Maulik, U.

doi:10.2478/jaiscr-2014-0008

Artykuł - szczegóły

Tytuł artykułu

Integrated statistical and rule-mining techniques for dna methylation and gene expression data analysis

Autorzy

Mallik S. , Mukhopadhyay A. , Maulik U.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2014-0008

Warianty tytułu

Języki publikacji

Abstrakty

For determination of the relationships among significant gene markers, statistical analysis and association rule mining are considered as very useful protocols. The first protocol identifies the significant differentially expressed/methylated gene markers, whereas the second one produces the interesting relationships among them across different types of samples or conditions. In this article, statistical tests and association rule mining based approaches have been used on gene expression and DNA methylation datasets for the prediction of different classes of samples (viz., Uterine Leiomyoma/class-formersmoker and uterine myometrium/class-neversmoker). A novel rule-based classifier is proposed for this purpose. Depending on sixteen different rule-interestingness measures, we have utilized a Genetic Algorithm based rank aggregation technique on the association rules which are generated from the training set of data by Apriori association rule mining algorithm. After determining the ranks of the rules, we have conducted a majority voting technique on each test point to estimate its class-label through weighted-sum method. We have run this classifier on the combined dataset using 4-fold cross-validations, and thereafter a comparative performance analysis has been made with other popular rulebased classifiers. Finally, the status of some important gene markers has been identified through the frequency analysis in the evolved rules for the two class-labels individually to formulate the interesting associations among them.

Słowa kluczowe

statistical analysis gene marker methylation genetic algorithm DNA

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2013

Tom

Vol. 3, No. 2

Strony

101--115

Opis fizyczny

Bibliogr. 33 poz., rys.

Twórcy

autor

Mallik S.

chasaurav r@isical.ac.in, sauravmtech2@gmail.com

Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata-700108, India

autor

Mukhopadhyay A.

anirban@klyuniv.ac.in

Department of Computer Science and Engineering, University of Kalyani, Kalyani, India

autor

Maulik U.

umaulik@cse.jdvu.ac.in

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India

Bibliografia

[1] A. Navarro, P. Yin, D. Monsivais, S. M. Lin, P. Du, J. J. Wei, S. E. Bulun, Genome-Wide DNA Methylation Indicates Silencing of Tumor Suppressor Genes in Uterine Leiomyoma, PLoS One, vol. 7, no. 3, pp. e33284, 2012.
[2] V. Pihur, S. Datta and S. Datta, RankAggreg, an R Package for Weighted Rank Aggregation, BMCBioinformatics, vol. 10, pp. 62-72, 2009.
[3] K.R.V. Eijk, S.D. Jong, M.P.M. Boks, T. Langeveld, F. Colas, J.H. Veldink, C.G.F.D. Kovel,E. Janson, E. Strengman, P. Langfelder, R. S. Kahn, L. H. V. D. Berg, S. Horvath and R. A.Ophoff, Genetic analysis of DNA methylation and gene expression levels in whole blood of healthy human subjects, BMC Genomics, vol. 13, no. 636, 2012.
[4] C.C. Yu, M. Furukawa, K. Kobayashi, C. Shikishima, P.C. Cha, J. Sese, H. Sugawara, K.Iwamoto, T. Kato, J. Ando and T. Toda, Genome-Wide DNA Methylation and Gene Expression Analyses of Monozygotic Twins Discordant for Intelligence Levels, PLoS One, vol. 7, no. 10, pp. e47081, 2012.
[5] S. Mallik, A. Mukhopadhyay, U. Maulik, and S. Bandyopadhyay, Integrated Analysis of Gene Expression and Genome-wide DNA Methylation for Tumor Prediction: An Association Rule Miningbased Approach, Proc. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), IEEE Symposium Series on Computational Intelligence - SSCI 2013, Singapore, pp. 120-127, April 16, 2013.
[6] M.T. Landi, T. Dracheva, M. Rounno, J.D. Figueroa, H. Liu, A. Dasgupta, F.E. Mann, J. Fukuoka, M. Hames, A.W. Bergen, S.E. Murphy,P. Yang, A.C. Pesatori, D. Consonni, P.A. Bertazzi, S. Wacholder, J.H. Shih, N.E. Caporaso and J. Jen,Gene Expression Signature of Cigarette Smoking and Its Role in Lung Adenocarcinoma Development and Survival, PLoS One, vol. 3, no. 2, pp. e1651, 2008.
[7] R. J. Fox, M.W. Dimmic, A Two-Sample Bayesian t-test for Microarray Data, BMC Bioinformatics, vol. 7, no. 126, pp. 1-11, 2006.
[8] A. Vickers, Parametric Versus Non-Parametric Statistics in the Analysis of Randomized Trials with Non-Normally Distributed Data, BMC Medical Research Methodology, vol. 5, no. 35, 2005.
[9] A. Mukhopadhyay, U. Maulik, and S. Bandyopdhyay, On Biclustering of Gene Expression Data, Current Bioinformatics, vol. 5, no. 3, pp. 204-216, 2010.
[10] A. Mukhopadhyay, U. Maulik, and S. Bandyopdhyay, A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1-Human Protein Interactions, PLoS One, vol. 7, no. 4, pp. e32289, 2012.
[11] R. Agrawal, T. Imielinski and A. Swami, MiningAssociation Rules between Sets of Items in large Databases, In: Proceedings of the 1993 ACM SIGMOD international conference on Management of data (SIGMOD’93), New York, NY, USA: ACM, pp. 207-216, 1993.
[12] A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay and R. Eils, Mining Association Rules from HIV-human Protein Interactions, Int Conf. Systems in Medicine and Biology (ICSMB), pp. 344-348, 2010.
[13] W. H. Catherino, C. Prupas, J. C. Tsibris, P. C. Leppert and M. Payson, Strategy for Elucidating Differentially Expressed Genes in Leiomyomata Identified by Microarray Technology, Fertil Steril, vol. 80, pp. 282-290, 2003.
[14] G. Smyth, Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments, Statistical Applicationsin Genetics and Molecular Biology, vol. 3, no. 1, pp. 3, 2004.
[15] P. Sethi and S. Alagiriswamy, Association Rule Based Similarity Measures for the Clustering of Gene Expression Data, The Open Medical Informatics Journal, vol. 4, pp. 63-73, 2010.
[16] P. Carmona-Saez, M. Chagoyen, A. Rodriguez, O. Trelles, J. M. Carazo and A. Pascual-Montano, Integrated Analysis of Gene Expression by Association Rules Discovery, BMC Bioinformatics, vol. 7, no. 54, 2006.
[17] X. Li, S. Mabu, H. Zhou, K. Shimada and K. Hirasawa, Analysis of Various Interestingness Measures in Class Association Rule Mining, SICE Journal of Control, Measurement, and System Integration, vol. 4, no. 4, pp. 295-304, 2011.
[18] C. Creighton, and S. Hanash, Mining Gene Expression Databases for Association Rules, Bioinformatics, vol. 19, no. 1, pp. 79-86, 2003.
[19] F. Tao, Weighted Association Rule Mining using Weighted Support and Significance Framework, In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington D.C., USA, pp. 661-666, 2003.
[20] M. Anandhavalli, M. K. Ghose, and K. Gauthaman, Interestingness Measure for Mining Spatial Gene Expression Data using Association Rule, Journal of Computing, vol. 2, no. 1, pp. 110-114, 2010.
[21] S. Dudoit, Y. Yang, T. Speed, and M. Callow, Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments, Statistica Sinica, vol. 12, pp. 111-139, 2002.
[22] S. Y. Kim, J. W. Lee, and I. S. Sohn, Comparison of Various Statistical Methods for Identifying Differential Gene Expression in Replicated Microarray Data, Statistical Methods in Medical Research, vol. 15, pp. 3-20, 2006.
[23] X. Luo, L. Ding, J. Xu and N. Chegini, Gene Expression Profiling of Leiomyoma and Myometrial Smooth Muscle Cells in Response to Transforming Growth Factor-¬, Endocrinology, vol. 146, no. 3, pp. 1097-1118, March 2005.
[24] Y. Pawitan, S. Michiels, S. Koscielny, A. Gusnanto, and A. Ploner, False Discovery Rate, Sensitivity and Sample Size for Microarray Studies, Bioinformatics, vol. 21, pp. 3017-3024, 2005.
[25] C.M. Jarque and A.K. Bera, A test for normality of observations and regression residuals, International Statistical Review, vol. 55, no. 2, pp. 163-172, 1987.
[26] Z. Wang and V. Palade, Building Interpretable Fuzzy Models for High Dimentional Data Analysis in Cancer Diagnosis, BMC Genomics, no. 12(S2):S5, 2011.
[27] Z. Wang, V. Palade and Y. Xu, Neuro-Fuzzy Ensemble Approach for Microarray Cancer Gene Expression Data Analysis, In Proceedings of the 2006 International Symposium on Evolving Fuzzy Systems, IEEE 2006.
[28] A. Mukhopadhyay, S. Bandyopadhyay and U. Maulik, Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification, PLoS One, vol. 5, no. 11, pp. e13803, 2010.
[29] S. Ray, A. Mukhopadhyay and U. Maulik, Predicting Annotated HIV-1Human PPIs using a Biclustering Approach to Association Rule Mining, In Proc. EAIT-2012, pp. 28-31, Kolkata, India, November 2012.
[30] R. Raji, F. Guzzo, L. Carrara, J. Varughese, E. Cocco, S. Bellone, M. Betti, P. Todeschini, S. Gasparrini, E. Ratner, D.A. Silasi, M. Azodi, P. Schwartz, T.J. Rutherford, N. Buza, S. Pecorelli and A.D. Santin, Uterine and ovarian carcinosarcomas overexpressing Trop-2 are sensitive to hRS7, a humanized anti-Trop-2 antibody, Journal of Experimental & Clinical Cancer Research, vol. 30, no. 106, 2011.
[31] M. Hahsler, C. Buchta, B. Gruen and K. Hornik, Package ‘arules’, 2013, http://R-Forge.Rproject. org/projects/arules/.
[32] T. Jayalakshmi and A. Santhakumaran, Statistical normalization and back propagation for classification, International Journal of Computer Theory and Engineering, vol. 3, no. 1, pp. 1793-8201, 2011.
[33] M. Bibikova and J. B. Fan, GoldenGate Assay for DNA Methylation Profiling, Methods in Molecular Biology, vol. 507, pp. 149-163, 2009

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-94a6ec6f-7056-4604-99dd-a2d39ce2d50f