Rough Hypercuboid Based Supervised Regularized Canonical Correlation for Multimodal Data Analysis

Maji, P.; Mandal, A.

doi:10.3233/FI-2016-1427

Artykuł - szczegóły

Tytuł artykułu

Rough Hypercuboid Based Supervised Regularized Canonical Correlation for Multimodal Data Analysis

Autorzy

Maji P. , Mandal A.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2016-1427

Warianty tytułu

Konferencja

Rough Set Theory Workshop (RST’2015); (6; 29-06-2015; University of Warsaw )

Języki publikacji

Abstrakty

One of the main problems in real life omics data analysis is how to extract relevant and non-redundant features from high dimensional multimodal data sets. In general, supervised regularized canonical correlation analysis (SRCCA) plays an important role in extracting new features from multimodal omics data sets. However, the existing SRCCA optimizes regularization parameters based on the quality of first pair of canonical variables only using standard feature evaluation indices. In this regard, this paper introduces a new SRCCA algorithm, integrating judiciously the merits of SRCCA and rough hypercuboid approach, to extract relevant and nonredundant features in approximation spaces from multimodal omics data sets. The proposed method optimizes regularization parameters of the SRCCA based on the quality of a set of pairs of canonical variables using rough hypercuboid approach. While the rough hypercuboid approach provides an efficient way to calculate the degree of dependency of class labels on feature set in approximation spaces, the merit of SRCCA helps in extracting non-redundant features from multimodal data sets. The effectiveness of the proposed approach, along with a comparison with related existing approaches, is demonstrated on several real life data sets.

Słowa kluczowe

multimodal data analysis canonical correlation analysis feature extraction rough sets rough hypercuboid approach

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2016

Tom

Vol. 148, nr 1/2

Strony

133--155

Opis fizyczny

Bibliogr. 60 poz., rys., tab., wykr.

Twórcy

autor

Maji P.

pmaji@isical.ac.in

Biomedical Imaging and Bioinformatics Lab, Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata, 700 108, West Bengal, India

autor

Mandal A.

amandal@isical.ac.in

Biomedical Imaging and Bioinformatics Lab, Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata, 700 108, West Bengal, India

Bibliografia

[1] Bylesjo M, Eriksson D, Kusano M, Moritz T, Trygg J. Data Integration in Plant Biology: The O2PLS Method for Combined Modeling of Transcript and Metabolite Data. The Plant Journal. 2007;52(6):1181–1191. doi:10.1111/j.1365-313X.2007.03293.x.
[2] Vijayendran C, Barsch A, Friehs K, Niehaus K, Becker A, Flaschel E. Perceiving Molecular Evolution Processes in Escherichia Coli by Comprehensive Metabolite and Gene Expression Profiling. Genome Biology. 2008;9(4). doi:10.1186/gb-2008-9-4-r72.
[3] Hotelling H. Relations Between Two Sets of Variates. Biometrika. 1936;28(3/4):321–377. doi:10.2307/2333955.
[4] Cao KAL, Gonzalez I, Dejean S. integrOmics: An R Package to Unravel Relationships Between Two Omics Datasets. Bioinformatics. 2009;25(21):2855–2856. doi:10.1093/bioinformatics/btp515.
[5] Soneson C, Lilljebjorn H, Fioretos T, Fontes M. Integrative Analysis of Gene Expression and Copy Number Alterations Using Canonical Correlation Analysis. BMC Bioinformatics. 2010;11(191). doi:10.1186/1471-2105-11-191.
[6] Lee G, Singanamalli A, Wang H, Feldman MD, Master SR, Shih NNC, et al. Supervised Multi-View Canonical Correlation Analysis (sMVCCA): Integrating Histologic and Proteomic Features for Predicting Recurrent Prostate Cancer. IEEE Transactions on Medical Imaging. 2015;34(1):284–297. doi:10.1109/TMI.2014.2355175.
[7] Nielsen AA. Multiset Canonical Correlations Analysis and Multispectral, Truly Multitemporal Remote Sensing Data. IEEE Transactions on Image Processing. 2002;11(3):293–305. doi:10.1109/83.988962.
[8] Yuan YH, Sun QS. Multiset Canonical Correlations Using Globality Preserving Projections With Applications to Feature Extraction and Recognition. IEEE Transactions on Neural Networks and Learning Systems. 2014;25(6):1131–1146. doi:10.1109/TNNLS.2013.2288062.
[9] Yuan YH, Sun QS, Zhou Q, Xia DS. A Novel Multiset Integrated Canonical Correlation Analysis Framework and its Application in Feature Fusion. Pattern Recognition. 2011;44(5):1031–1040. doi:10.1016/j.patcog.2010.11.004.
[10] Gonzalez I, Dejean S, Martin PGP, Baccini A. CCA: An R Package to Extend Canonical Correlation Analysis. Journal of Statistical Software. 2008;23(12):1–14. doi:10.18637/jss.v023.i12.
[11] Gonzalez I, Dejean S, Martin PGP, Goncalves O, Besse P, Baccini A. Highlighting Relationships Between Heteregeneous Biological Data Through Graphical Displays Based on Regularized Canonical Correlation Analysis. Journal of Biological Systems. 2009;17(2):173–199. doi:10.1142/S0218339009002831.
[12] Vinod HD. Canonical Ridge and Econometrics of Joint Production. Journal of Econometrics. 1976;4(2):147–166. doi:10.1016/0304-4076(76)90010-5.
[13] Bie TD, Moor BD. On the Regularization of Canonical Correlation Analysis. In: Proceedings of the 4th International Symposium on Independent Component Analysis and Blind Signal Separation; 2003. p.785–790. Available from: http://hdl.handle.net/1854/LU-7032483.
[14] Kakade SM, Foster DP. Multi-View Regression Via Canonical Correlation Analysis. In: Proceedings of the 20th Annual Conference on Learning Theory; 2007. p. 82–96. doi:10.1007/978-3-540-72927-3_8.
[15] Golugula A, Lee G, Master SR, Feldman MD, Tomaszewski JE, Speicher DW, et al. Supervised Regularized Canonical Correlation Analysis: Integrating Histologic and Proteomic Measurements for Predicting Biochemical Recurrence Following Prostate Surgery. BMC Bioinformatics. 2011;12(483). doi:10.1186/1471-2105-12-483.
[16] Pawlak Z. Rough Sets: Theoretical Aspects of Reasoning about Data. Dordrecht and Boston and London: Kluwer Academic Publishers; 1991. ISBN: 0792314727.
[17] Dubois D, Prade H. Rough Fuzzy Sets and Fuzzy Rough Sets. International Journal of General Systems. 1990;17(2-3):191–209. doi:10.1080/03081079008935107.
[18] Maji P, Pal SK. Rough-Fuzzy Pattern Recognition: Applications in Bioinformatics and Medical Imaging. Hoboken, New Jersey: Wiley-IEEE Computer Society Press; 2012. ISBN: 978-1-118-00440-1.
[19] Jensen R, Shen Q. Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches. IEEE Transactions on Knowledge and Data Engineering. 2004;16(12):1457–1471. doi:10.1109/TKDE.2004.96.
[20] Maji P, Paul S. Rough Set Based Maximum Relevance-Maximum Significance Criterion and Gene Selection from Microarray Data. International Journal of Approximate Reasoning. 2011;52(3):408–426. doi:10.1016/j.ijar.2010.09.006.
[21] Liu X, Pedrycz W, Chai T, Song M. The Development of Fuzzy Rough Sets with the Use of Structures and Algebras of Axiomatic Fuzzy Sets. IEEE Transactions on Knowledge and Data Engineering. 2009;21(3):443–462. doi:10.1109/TKDE.2008.147.
[22] Pal SK, Skowron A, editors. Rough-Fuzzy Hybridization: A New Trend in Decision Making. Singapore: Springer-Verlag; 1999. ISBN: 9814021008.
[23] Yeung DS, Chen D, Tsang ECC, Lee JWT, Xizhao W. On the Generalization of Fuzzy Rough Sets. IEEE Transactions on Fuzzy Systems. 2005;13(3):343–361. doi:10.1109/TFUZZ.2004.841734.
[24] Cornelis C, Jensen R, Hurtado G, Slezak D. Attribute Selection With Fuzzy Decision Reducts. Information Sciences. 2010;180(2):209–224. doi:10.1016/j.ins.2009.09.008.
[25] Hu Q, Yu D, Xie Z, Liu J. Fuzzy Probabilistic Approximation Spaces and Their Information Measures. IEEE Transactions on Fuzzy Systems. 2006;14(2):191–201. doi:10.1109/TFUZZ.2005.864086.
[26] Jensen R, Shen Q. Fuzzy-Rough Sets Assisted Attribute Selection. IEEE Transactions on Fuzzy Systems. 2007;15(1):73–89. doi:10.1109/TFUZZ.2006.889761.
[27] Jensen R, Shen Q. New Approaches to Fuzzy-Rough Feature Selection. IEEE Transactions on Fuzzy Systems. 2009;17(4):824–838. doi:10.1109/TFUZZ.2008.924209.
[28] Maji P, Pal SK. Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces. IEEE Transactions on Knowledge and Data Engineering. 2010;22(6):854–867. doi:10.1109/TKDE.2009.124.
[29] Riza LS, Janusz A, Bergmeir C, Cornelis C, Herrera F, Slezak D, et al. Implementing Algorithms of Rough Set Theory and Fuzzy Rough Set Theory in the R Package “RoughSets”. Information Sciences. 2014;287:68–89. doi:10.1016/j.ins.2014.07.029.
[30] Tsang ECC, Chen D, Yeung DS, Wang XZ, Lee JWT. Attributes Reduction Using Fuzzy Rough Sets. IEEE Transactions on Fuzzy Systems. 2008;16(5):1130–1141. doi:10.1109/TFUZZ.2006.889960.
[31] Wei JM, Wang SQ, Yuan XJ. Ensemble Rough Hypercuboid Approach for Classifying Cancers. IEEE Transactions on Knowledge and Data Engineering. 2010;22(3):381–391. doi:10.1109/TKDE.2009.114.
[32] Maji P. Rough Hypercuboid Approach for Feature Selection in Approximation Spaces. IEEE Transactions on Knowledge and Data Engineering. 2014;26(1):16–29. doi:10.1109/TKDE.2012.242.
[33] Fang J, Grzymala-Busse JW. Mining of MicroRNA Expression Data: A Rough Set Approach. In: Proceedings of the 1st International Conference on Rough Sets and Knowledge Technology. Springer, Berlin, Heidelberg; 2006. p. 758–765. doi:10.1007/11795131_110.
[34] Maji P. Fuzzy-Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. 2011;41(1):222–233. doi:10.1109/TSMCB.2010.2050684.
[35] Maji P, Pal SK. Fuzzy-Rough Sets for Information Measures and Selection of Relevant Genes from Microarray Data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. 2010;40(3):741–752. doi:10.1109/TSMCB.2009.2028433.
[36] Maji P, Paul S. Microarray Time-Series Data Clustering Using Rough-Fuzzy C-means Algorithm. In: Proceedings of the 5th IEEE International Conference on Bioinformatics and Biomedicine; 2011. p. 269–272. doi:10.1109/BIBM.2011.14.
[37] Maji P, Paul S. Robust Rough-Fuzzy C-Means Algorithm: Design and Applications in Coding and Non-coding RNA Expression Data Clustering. Fundamenta Informaticae. 2013;124(1-2):153–174. doi:10.3233/FI-2013-829.
[38] Maji P, Paul S. Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2013;10(2):286–299. doi:10.1109/TCBB.2012.103.
[39] Maji P, Paul S. Scalable Pattern Recognition Algorithms: Applications in Computational Biology and Bioinformatics. Springer-Verlag, London; 2014. ISBN: 978-3-319-05629-6.
[40] Paul S, Maji P. Robust RFCM Algorithm for Identification of Coexpressed miRNAs. In: Proceedings of IEEE International Conference on Bioinformatics and Biomedicine; 2012. p. 520–523. doi:10.1109/BIBM.2012.6392609.
[41] Paul S, Maji P. Rough Sets and Support Vector Machine for Selecting Differentially Expressed miRNAs. In: Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine Workshops; 2012. p. 864–871. ISBN: 978-1-4673-2746-6.
[42] Paul S, Maji P. µHEM for Identification of Differentially Expressed miRNAs Using Hypercuboid Equivalence Partition Matrix. BMC Bioinformatics. 2013;14(1). doi:10.1186/1471-2105-14-266.
[43] Paul S, Maji P. Rough Sets for Insilico Identification of Differentially Expressed miRNAs. International Journal of Nanomedicine. 2013;8:63–74. doi:10.2147/IJN.S40739.
[44] Paul S, Maji P. City Block Distance and Rough-Fuzzy Clustering for Identification of Co-Expressed microRNAs. Molecular BioSystems. 2014;10(6):1509–1523. doi:10.1039/C4MB00101J.
[45] Paul S, Vera J. Rough Hypercuboid Based Supervised Clustering of miRNAs. Molecular BioSystems. 2015;11(7):2068–2081. doi:10.1039/c5mb00213c.
[46] Slezak D, Wroblewski J. Roughfication of Numeric Decision Tables: The Case Study of Gene Expression Data. In: Proceedings of the 2nd International Conference on Rough Sets and Knowledge Technology; 2007. p. 316–323. doi:10.1007/978-3-540-72458-2_39.
[47] Valdes JJ, Barton AJ. Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions. In: Proceedings of the 1st International Conference on Rough Sets and Knowledge Technology; 2006. p. 482–489. doi:10.1007/11795131_70.
[48] Gladwell GML. On Isospectral Spring - Mass Systems. Inverse Problems. 1995;11(3):591–602. Available from: http://stacks.iop.org/0266-5611/11/i=3/a=007.
[49] Gou Z, Fyfe C. A Canonical Correlation Neural Network for Multicollinearity and Functional Data. Neural Networks. 2004;17(2):285–293. doi:10.1016/j.neunet.2003.07.002.
[50] Eaton ML, Perlman MD. The Non-Singularity of Generalized Sample Covariance Matrices. The Annals of Statistics. 1973;1(4):710–717. doi:10.1214/aos/1176342465.
[51] Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 1970;12(1):55–67. doi:10.2307/1267352.
[52] Guo Y, Hastie T, Tibshirani R. Regularized Linear Discriminant Analysis and Its Application in Microarrays. Biostatistics. 2007;8(1):86–100. doi:10.1093/biostatistics/kxj035.
[53] Jafari P, Azuaje F. An Assessment of Recently Published Gene Expression Data Analysis: Reporting Experimental Design and Statistical Factors. BMC Medical Informatics and Decision Making. 2006;6(27). doi:10.1186/1472-6947-6-27.
[54] Thomas JG, Olson JM, Tapscott SJ, Zhao LP. An Efficient and Robust Statistical Modeling Approach to Discover Differentially Expressed Genes Using Genomic Expression Profiles. Genome Research. 2001;11(7):1227–1236. doi:10.1101/gr.165101.
[55] Hwang D, Schmitt WA, Stephanopoulos G, Stephanopoulos G. Determination of Minimum Sample Size and Discriminatory Expression Patterns in Microarray Data. Bioinformatics. 2002;18(9):1184–1193. Available from: http://bioinformatics.oxfordjournals.org/content/18/9/1184.long.
[56] Vapnik V. The Nature of Statistical Learning Theory. New York: Springer; 1995. ISBN: 0-387-94559-8.
[57] Efron B, Tibshirani R. Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Association. 1997;92(438):548–560. doi:10.2307/2965703.
[58] Au WH, Chan KCC, Wong AKC, Wang Y. Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2005;2(2):83–101. doi:10.1109/TCBB.2005.17.
[59] Gruzdz A, Ihnatowicz A, Slezak D. Interactive Gene Clustering - A Case Study of Breast Cancer Microarray Data. Information Systems Frontiers. 2006;8(1):21–27. doi:10.1007/s10796-005-6100-x.
[60] Janusz A, Slezak D. Utilization of Attribute Clustering Methods for Scalable Computation of Reducts from High-Dimensional Data. In: Proceedings of the Federated Conference on Computer Science and Information Systems; 2012. p. 295–302. ISBN: 78-83-60810-48-4.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-28dc64e3-2c7c-4483-afe5-4ad360e15604