High Frequency Rule Synthesis in a Large Scale Multiple Database with MapReduce

Bisoyi, Sudhanshu Shekhar; Mishra, Pragnyaban; Mishra, Saroja Nanda

doi:10.24425/ijet.2022.139865

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

High Frequency Rule Synthesis in a Large Scale Multiple Database with MapReduce

Autorzy

Bisoyi Sudhanshu Shekhar , Mishra Pragnyaban , Mishra Saroja Nanda

Treść / Zawartość

Pełne teksty:

177_High Frequency Rule 3373-11421-1-PB.pdf

Pobierz

Identyfikatory

DOI

10.24425/ijet.2022.139865

Warianty tytułu

Języki publikacji

Abstrakty

Increasing development in information and communication technology leads to the generation of large amount of data from various sources. These collected data from multiple sources grows exponentially and may not be structurally uniform. In general, these are heterogeneous and distributed in multiple databases. Because of large volume, high velocity and variety of data mining knowledge in this environment becomes a big data challenge. Distributed Association Rule Mining(DARM) in these circumstances becomes a tedious task for an effective global Decision Support System(DSS). The DARM algorithms generate a large number of association rules and frequent itemset in the big data environment. In this situation synthesizing highfrequency rules from the big database becomes more challenging. Many algorithms for synthesizing association rule have been proposed in multiple database mining environments. These are facing enormous challenges in terms of high availability, scalability, efficiency, high cost for the storage and processing of large intermediate results and multiple redundant rules. In this paper, we have proposed a model to collect data from multiple sources into a big data storage framework based on HDFS. Secondly, a weighted multi-partitioned method for synthesizing high-frequency rules using MapReduce programming paradigm has been proposed. Experiments have been conducted in a parallel and distributed environment by using commodity hardware. We ensure the efficiency, scalability, high availability and costeffectiveness of our proposed method.

Słowa kluczowe

multiple database frequent itemset association rule rule synthesis MapReduce HDFS

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2022

Tom

Vol. 68, No. 2

Strony

177--186

Opis fizyczny

Bibliogr. 26 poz., schem., tab., wykr.

Twórcy

autor

Bisoyi Sudhanshu Shekhar

sudhanshu.bisoyi@gmail.com

Department of Computer Science and Information Technology, Siksha ’O’ Anusandhan Deemed to be University (SOA), Institute of Technical Education and Research (ITER), Bhubaneswar, Odisha, India

autor

Mishra Pragnyaban

pragnyaban@gmail.com

Dept. of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, AP, India

autor

Mishra Saroja Nanda

sarose.mishra@gmail.com

Dept. of CSE&A, IGIT, Sarang, Dhenkanal, Odisha

Bibliografia

[1] R. Agrawal and J. C. Shafer, “Parallel mining of association rules,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 962–969, 1996. [Online]. Available: https://dx.doi.org/10.1109/69.553164
[2] D. W. Cheung, V. T. Ng, A. W. Fu, and Y. Fu, “Efficient mining of association rules in distributed databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 911–922, 1996. [Online]. Available: https:g//dx.doi.org/10.1109/69.553158
[3] F. Wei and B. Albert, “Mining big data: Current status, and forecast to the future,” ACM SIGKDD Explor. Newslett., vol. 14, no. 2, pp. 1–5, 2012. [Online]. Available: https://dx.doi.org/10.1145/2481244.2481246
[4] R. Didugu and D. A. Devarakonda, “A framework for exploring algorithms for big data mining,” Indian Journal of Science and Technology, vol. 9, 05 2016. [Online]. Available: https://dx.doi.org/10.17485/ijst/2016/v9i17/93017
[5] R. Didugu and D. A. Devarakonda, Adding Big Value to Big Businesses: A Present State of the Art of Big Data, Frameworks and Algorithms, 01 2018, pp. 171–184. [Online]. Available: https://dx.doi.org/10.1007/978-981-10-6602-3_17
[6] M. J. Zaki, Parallel and Distributed Data Mining: An Introduction. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 1–23. [Online]. Available: https://dx.doi.org/10.1007/3-540-46502-2_1
[7] W. Tom, Hadoop – The Definitive Guide 4th Edition. O’Reilly, 2015.
[8] V. Vasantham and D. Haritha, “A survey on cost minimization techniques for big data processing,” Journal of Advanced Research in Dynamical and Control Systems, vol. 10, pp. 547–551, 01 2018.
[9] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, pp. 1–10. [Online]. Available: https://dx.doi.org/10.1109/MSST.2010.5496972
[10] K. Rao, S. Subramani, M. Prasad, and A. Saravanan, “Technical challenges and perspectives in batch and stream big data machine learning,” International Journal of Engineering and Technology, vol. 7, p. 48, 12 2017. [Online]. Available: https://dx.doi.org/10.14419/ijet.v7i1.3.9225
[11] E.-H. Han, G. Karypis, and V. Kumar, “Scalable parallel data mining for association rules,” SIGMOD Rec., vol. 26, no. 2, pp. 277–288, 1997. [Online]. Available: https://dx.doi.org/10.1145/253262.253330
[12] R. Grossman, S. Bailey, A. Ramu, B. Malhi, and A. Turinsky, “The preliminary design of papyrus: A system for high performance, distributed data mining over clusters,” Advances in Distributed and Parallel Knowledge Discovery, pp. 259–275, 2000.
[13] H. Liu, H. Lu, and J. Yao, “Toward multidatabase mining: identifying relevant databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 4, pp. 541–553, 2001. [Online]. Available: https://dx.doi.org/10.1109/69.940731
[14] X. Wu, C. Zhang, and S. Zhang, “Database classification for multidatabase mining,” Inf. Syst., vol. 30, no. 1, pp. 71–88, 2005. [Online]. Available: https://dx.doi.org/10.1016/j.is.2003.10.001
[15] S. Zhang, X. Wu, and C. Zhang, “Multi-database mining,” IEEE Computational Intelligence Bulletin, vol. 2, no. 1, pp. 5–13, 2003.
[16] Z. Chengqi, L. Meiling, N. Wenlong, and Z. Shichao, “Identifying global exceptional patterns in multi-database mining,” IEEE Intelligent Informatics Bulletin, vol. 3, no. 1, pp. 19–24, 2004.
[17] X. Wu and S. Zhang, “Synthesizing high-frequency rules from different data sources,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 353–367, 2003. [Online]. Available: https://dx.doi.org/10.1109/TKDE.2003.1185839
[18] T. Ramkumar and R. Srinivasan, “Modified algorithms for synthesizing high-frequency rules from different data sources,” Knowl. Inf. Syst., vol. 17, no. 3, pp. 313–334, 2008. [Online]. Available: https://dx.doi.org/10.1007/s10115-008-0126-6
[19] R. Thirunavukkarasu, H. Shanmugasundaram, and S. Shanmugam, “Synthesizing global negative association rules in multi-database mining,” International Arab Journal of Information Technology, vol. 11, no. 6, pp. 526–531, 2014.
[20] A. Adhikari, L.C. Jain, and S. Ramanna, “Analysing effect of database grouping on multi-database mining,” IEEE Intelligent Informatics Bulletin, vol. 12, no. 1, pp. 25–32, 2011.
[21] L. Ning, Z. Li, H. Qing, and S. Zhongzhi, “Parallel implementation of apriori algorithm based on mapreduce,” International Journal of Networked and Distributed Computing, vol. 1, no. 2, pp. 89–96, 2013. [Online]. Available: https://dx.doi.org/10.1109/SNPD.2012.31
[22] X. Yang, Z. Liu, and Y. Fu, “Mapreduce as a programming model for association rules algorithm on hadoop,” in Proceedings of the 3rd International Conference on Information Sciences and Interaction Sciences, ser. ICIS ’10. IEEE, 2010, pp. 99–102. [Online]. Available: https://dx.doi.org/10.1109/ICICIS.2010.5534718
[23] Z. Farzanyar and N. Cercone, “Efficient mining of frequent itemsets in social network data based on mapreduce framework,” in Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ser. ASONAM ’13. ACM, 2013, pp. 1183–1188. [Online]. Available: https://dx.doi.org/10.1145/2492517.2500301
[24] J. Dean and S. Ghemawat, “Mapreduce: A flexible data processing tool,” Commun. ACM, vol. 53, pp. 72–77, 2010. [Online]. Available: https://dx.doi.org/10.1145/1629175.1629198
[25] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache hadoop yarn: Yet another resource negotiator,” in Proceedings of the 4th Annual Symposium on Cloud Computing, ser. SOCC ’13, 2013, pp. 1–16. [Online]. Available: https://dx.doi.org/10.1145/2523616.2523633
[26] A. Vishwanath and R. Murugan, “Parallel processing on big data in the context of machine learning and hadoop ecosystem: A survey,” International Journal of Engineering and Technology, vol. 7, p. 577, 03 2018. [Online]. Available: https://dx.doi.org/10.14419/ijet.v7i2.7.10885

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-acbf0b87-ab66-4d79-9dbb-08eda581f7e0