PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A survey of big data classification strategies

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Big data plays nowadays a major role in finance, industry, medicine, and various other fields. In this survey, 50 research papers are reviewed regarding different big data classification techniques presented and/or used in the respective studies. The classification techniques are categorized into machine learning, evolutionary intelligence, fuzzy-based approaches, deep learning and so on. The research gaps and the challenges of the big data classification, faced by the existing techniques are also listed and described, which should help the researchers in enhancing the effectiveness of their future works. The research papers are analyzed for different techniques with respect to software tools, datasets used, publication year, classification techniques, and the performance metrics. It can be concluded from the here presented survey that the most frequently used big data classification methods are based on the machine learning techniques and the apparently most commonly used dataset for big data classification is the UCI repository dataset. The most frequently used performance metrics are accuracy and execution time.
Rocznik
Strony
447--469
Opis fizyczny
Bibliogr. 59 poz., rys., tab.
Twórcy
  • Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India
autor
  • Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India
Bibliografia
  • Abawajy, J.H., Kelarevand, A. and Chowdhury, M. (2014) Large Iterative Multitier Ensemble Classifiers for Security of Big Data. IEEE Transactions on Emerging Topics in Computing, 2(3), 352 – 363.
  • Ahlawat, K. and Singh, A. P. (2017) A Novel Hybrid Technique for Big Data Classification Using Decision Tree Learning. In: Proceedings of the International Conference on Computational Intelligence, Communications, and Business Analytics. Springer, 118-128.
  • Arnaiz-González, ´A., González-Rogel, A., D´ıez-Pastor, J-F. and López-Nozal, C. (2017) MR-DIS: democratic instance selection for big data by MapReduce. Progress in Artificial Intelligence, 6(3), 211–219.
  • Bakry, M.E., Safwat, S. and Hegazy, O. (2016) A Mapreduce Fuzzy Techniques of Big Data Classification. In: Proceedings of SAI Computing Conference, London, UK. Springer, 13-15.
  • Banchhor, C. and Srinivasu, N. (2018) FCNB: Fuzzy Correlative Naive Bayes Classifier with MapReduce Framework for Big Data Classification. Journal of Intelligent Systems, 29(1).
  • Bechini, A., Marcelloni, F. and Segatori, A. (2016) A MapReduce Solution for Associative Classification of Big Data. Information Sciences, 332, 33-55.
  • Beno, M. M., Valarmathi I. R., Swamy S. M. and Rajakumar, B. R. (2014) Threshold prediction for segmenting tumour from brain MRI scans. International Journal of Imaging Systems and Technology, 24(2), 129-137.
  • Bhagat, R.C. and Patil, S.S. (2015) Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest. In: Proceedings of IEEE International Advance Computing Conference (IACC). IEEE.
  • Bhukya, R. and Gyani, B.J. (2015) Fuzzy Associative Classification Algorithm Based on MapReduce Framework. In: Proceedings of the International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT). IEEE.
  • Cao, J., Cui, H., Shi, H. and Jiao, L. (2016) Big Data: A Parallel Particle Swarm Optimization-Back-Propagation Neural Network Algorithm Based on MapReduce. PloS One, 11(6).
  • Cavallaro, G., Riedel, M., Richerzhagen, M., Benediktsson, J.A. and Plaza, A. (2015) On Understanding Big Data Impacts in Remotely Sensed Image Classification Using Support Vector Machine Methods. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(10), 4634-4646.
  • Chen, J., Chen, H., Wan, X. and Zheng, G. (2016)MR-ELM: aMapReducebased framework for large-scale ELM training in big data era. Neural Computing and Applications, 27(1), 101–110.
  • Dagdia, Z.C. (2019) A scalable and distributed dendritic cell algorithm for big data classification. Journal of Swarm and Evolutionary Computation, 50.
  • Demidova, L., Nikulchev, E. and Sokolova, Y. (2016) Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles. International Journal of Advanced Computer Science and Applications, 7(5).
  • Dessí, D., Fenu, G., Marras, M. and Recupero, D.R. (2019) Bridging learning analytics and Cognitive Computing for Big Data classification in micro-learning video collections. Computers in Human Behavior, 92, 468-477.
  • Duan, M., Li, K., Liao, X. and Li, K. (2018) A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine. IEEE Transactions on Neural Networks and Learning Systems, 29(6), 2337–2351.
  • Elkano, M., Galar, M., Sanz, J. and Bustince, H. (2018) CHI-BD: A Fuzzy Rule-Based Classification System for Big Data classification problems. Fuzzy Sets and Systems, 348, 75-101.
  • Fernández, A., Río, S., Bawakid, A. and Herrera, F. (2017) Fuzzy rule based classification systems for big data with MapReduce: granularity analysis. Advances in Data Analysis and Classification, 11(4), 711–730.
  • Fong, S., Wong, R. and Vasilakos, A.V. (2016) Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data. IEEE Transactions on Services Computing, 9(1), 33 – 45.
  • Gao, Sh. and Gao, K. (2014) Modelling on Classification and Retrieval Strategy in Map-Reduce Based IR System. In: Proceedings of 2014 International Conference on Modelling, Identification & Control, Melbourne, Australia. IEEE, 322-325.
  • García-Gil, D., Luengo, J., García, S. and Herrera, F. (2019) Enabling Smart Data: Noise filtering in Big Data classification. Information Sciences, 479, 135-152.
  • Hababeh, I., Gharaibeh, A., Nofal, S. and Khalil, I. (2018) An IntegratedMethodology for Big Data Classification and Security for Improving Cloud Systems Data Mobility. IEEE Access, 7, 9153 – 9163.
  • Haque, A., Parker, B., Khan, L. and Thuraisingham, B. (2014) Evolving Big Data Stream Classification with MapReduce. In: Proceedings of IEEE 7th International Conference on Cloud Computing. IEEE, 570-577.
  • Jin, S., Peng, J. and Xie, D. (2017) Towards MapReduce Approach with Dynamic Fuzzy Inference/Interpolation for Big Data Classification Problems. In: Proceedings of the IEEE 16th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC). IEEE.
  • Kamal, S., Parvin, S., Ashour, A.S., Shi, F. and Dey, N. (2017) De-Bruijn graph with MapReduce framework towards metagenomic data classification. International Journal of Information Technology, 9(1), 59–75.
  • Koliopoulos, A-K., Yiapanis, P., Tekiner, F., Nenadic, G. and Keane, J. (2015) A Parallel Distributed Weka Framework for Big Data Mining using Spark. In: Proceedings of IEEE International Congress on Big Data. IEEE.
  • Lakshmanaprabu, S.K., Shankar, K., Khanna, A., Gupta, D., Rodrigues, D.J.J. and Albuquerque, V.H.C.D. (2018) Effective Features to Classify Big Data Using Social Internet of Things. IEEE Access, 6, 24196-24204.
  • Lin, K-C., Zhang, K-Y., Huang, Y-H., Hung, J.C. and Yen, N. (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. The Journal of Supercomputing, 72(8), 3210–3221.
  • Lin, W., Wu, Z., Lin, L., Wen, A. and Li, J. (2017) An Ensemble Random Forest Algorithm for Insurance Big Data Analysis. IEEE Access, 5, 16568–16575.
  • Liu, B., Blasch, E., Chen, Y., Shen, D. and Chen, G. (2013) Scalable Sentiment Classification for Big Data Analysis Using Naıve Bayes Classifier. In: Proceedings of the IEEE International Conference on Big Data. IEEE.
  • Lopez, V., Rıo, S., Benıtez, J.M. and Herrera, F. (2014) On the use of MapReduce to build Linguistic Fuzzy Rule Based Classification Systems for Big Data. In: Proceedings of IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Beijing, China. IEEE.
  • López, V., Río, S., Benítez, J.M. and Herrera, F. (2015) Cost sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets and Systems, 258, 5–38.
  • Ludwig, S.A. (2015) MapReduce-based fuzzy c-means clustering algorithm: implementation and scalability. International Journal of Machine Learning and Cybernetics, 6(6), 923–934.
  • Maillo, J., Ramírez, S., Triguero, I. and Herrera, F. (2017) kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors Classifier for Big Data. Knowledge-Based Systems, 117, 3-15.
  • Maillo, J., Triguero, I. and Herrera, F. (2015) A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification. IEEE Trustcom / BigDataSE / ISPA, Helsinki, Finland. IEEE.
  • Marrón, D., Read, J., Bifet, A.T. and Navarro, N. (2017) Data stream classification using random feature functions and novel method combinations. The Journal of Systems and Software, 127, 195-204.
  • Menaga, D. and Revathi, S. (2020) Deep Learning: A Recent Computing Platform for Multimedia Information Retrieval. In: Deep Learning Techniques and Optimization Strategies in Big Data Analytics, 124-141.
  • Patil, S.S. and Sonavane, S.P. (2017) Enriched Over-Sampling Techniques for Improving Classification of Imbalanced Big Data. In: Proceedings of IEEE Third International Conference on Big Data Computing Service and Applications. IEEE.
  • Qian, J., Lv, P., Yue, X., Liu, C. and Jing, Z. (2015) Hierarchical attribute reduction algorithms for big data using MapReduce. Journal of Knowledge-Based Systems, 73, 18-31.
  • Read, J. and Bifet, A. (2015) Data Stream Classification using Random Feature Functions and Novel Method Combinations. IEEE Trustcom / BigDataSE / ISPA, Helsinki, Finland. IEEE.
  • Ríoa, S.D., L´opez, V., Benítez, J.M. and Herrera, F. (2015) A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules. International Journal of Computational Intelligence Systems, 8(3), 422-437.
  • Satish, K.V.R and Kavya, N. P. (2014) Big Data Processing with harnessing Hadoop - MapReduce for Optimizing Analytical Workloads. In: Proceedings of the International Conference on Contemporary Computing and Informatics (IC3I). IEEE.
  • Scardapane, S., Wang, D. and Panella, M. (2016) A decentralized training algorithm for Echo State Networks in distributed big data applications. Neural Networks, 78, 5–74.
  • Segatori, A., Marcelloni, F. and Pedrycz, W. (2018) On Distributed Fuzzy Decision Trees for Big Data. IEEE Transactions on Fuzzy Systems, 26(1), 174-192.
  • Shafiqand, M.O. and Torunski, E. (2017) Towards Map Reduce based Bayesian Deep Learning Network for Monitoring Big Data Applications. In: Proceedings of the IEEE International Conference on Big Data (BIG-DATA). IEEE.
  • Singh, K., Guntuku, S.C., Thakur, A. and Hota, C. (2014) Big Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests. Information Sciences, 278, 488-497.
  • Subramaniyaswamy, V., Vijayakumar, V., Logesh, R. and Indragandhi, V. (2015) Unstructured Data Analysis on Big Data using Map Reduce. In: Proceedings of the 2nd International Symposium on Big Data and Cloud Computing (ISBCC’15). Procedia Computer Science, 50, 456-465.
  • Suthaharan, S. (2014) Big data classification: Problems and challenges in network intrusion prediction with machine learning. ACM SIGMETRICS Performance Evaluation Review, 41(4), 70-73.
  • Thomas, R. and Rangachar, M.J.S. (2019) Fractional Rider and Multi-Kernel-Based Spherical SVM for Low Resolution Face Recognition. Multimedia Research, 2(2), 35-43.
  • Triguero, I., Galar, M., Vluymans, S., Cornelis, C., Bustince, H., Herrera, F. and Saeys, Y. (2015) Evolutionary Undersampling for Imbalanced Big Data Classification. In: Proceedings of IEEE Congress on Evolutionary Computation (CEC). IEEE.
  • Triguero, I., Galar, M., Merino, D., Maillo, J., Bustince, H. and Herrera, F. (2016) Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark. In: Proceedings of IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada. IEEE.
  • Tsai, C-F., Lin, W-C. and Ke, S-W. (2016) Big data mining with parallel computing: A comparison of distributed and MapReduce methodologies. Journal of Systems and Software, 122, 83–92.
  • Ulfarsson, M.O., Palsson, F., Sigurdsson, J. and Sveinsson, J.R. (2016) Classification of Big Data With Application to Imaging Genetics. Proceedings of the IEEE, 104(11), 2137-2154.
  • Varatharajan, R., Manogaran, G. and Priyan, M. K. (2018) A big data classification approach using LDA with an enhanced SVM method for ECG signals in cloud computing. Multimedia Tools and Applications, 77(8), 10195–10215.
  • Xin, J., Wang, Z., Qu, L. and Wang, G. (2015) Elastic extreme learning machine for big data classification. Neurocomputing, 149, 464–471.
  • Xu, K., Wen, C., Yuan, Q., He, X. and Tie, J. (2014) A MapReduce based Parallel SVM for Email Classification. Journal of Networks, 9(6), 1640-1647.
  • Zhai, J., Zhang, S. and Wang, C. (2017) The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers. International Journal of Machine Learning and Cybernetics, 8(3), 1009–1017.
  • Zhang, S., Deng, Z., Cheng, D., Zong, M. and Zhu, X. (2016) Efficient kNN Classification Algorithm for Big Data. Neurocomputing, 195, 143-148.
  • Zhou, L., Wang, H. and Wang, W. (2012) Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment. TELKOMNIKA Indonesian Journal of Electrical Engineering, 10(5), 1087-1092.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-1f770b23-8669-4e45-b8d0-af6a8e4a3b23
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.