Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

High Frequency Rule Synthesis in a Large Scale Multiple Database with MapReduce

Treść / Zawartość
Warianty tytułu
Języki publikacji
Increasing development in information and communication technology leads to the generation of large amount of data from various sources. These collected data from multiple sources grows exponentially and may not be structurally uniform. In general, these are heterogeneous and distributed in multiple databases. Because of large volume, high velocity and variety of data mining knowledge in this environment becomes a big data challenge. Distributed Association Rule Mining(DARM) in these circumstances becomes a tedious task for an effective global Decision Support System(DSS). The DARM algorithms generate a large number of association rules and frequent itemset in the big data environment. In this situation synthesizing highfrequency rules from the big database becomes more challenging. Many algorithms for synthesizing association rule have been proposed in multiple database mining environments. These are facing enormous challenges in terms of high availability, scalability, efficiency, high cost for the storage and processing of large intermediate results and multiple redundant rules. In this paper, we have proposed a model to collect data from multiple sources into a big data storage framework based on HDFS. Secondly, a weighted multi-partitioned method for synthesizing high-frequency rules using MapReduce programming paradigm has been proposed. Experiments have been conducted in a parallel and distributed environment by using commodity hardware. We ensure the efficiency, scalability, high availability and costeffectiveness of our proposed method.
Opis fizyczny
Bibliogr. 26 poz., schem., tab., wykr.
  • Department of Computer Science and Information Technology, Siksha ’O’ Anusandhan Deemed to be University (SOA), Institute of Technical Education and Research (ITER), Bhubaneswar, Odisha, India
  • Dept. of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, AP, India
  • Dept. of CSE&A, IGIT, Sarang, Dhenkanal, Odisha
  • [1] R. Agrawal and J. C. Shafer, “Parallel mining of association rules,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 962–969, 1996. [Online]. Available:
  • [2] D. W. Cheung, V. T. Ng, A. W. Fu, and Y. Fu, “Efficient mining of association rules in distributed databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 911–922, 1996. [Online]. Available: https:g//
  • [3] F. Wei and B. Albert, “Mining big data: Current status, and forecast to the future,” ACM SIGKDD Explor. Newslett., vol. 14, no. 2, pp. 1–5, 2012. [Online]. Available:
  • [4] R. Didugu and D. A. Devarakonda, “A framework for exploring algorithms for big data mining,” Indian Journal of Science and Technology, vol. 9, 05 2016. [Online]. Available:
  • [5] R. Didugu and D. A. Devarakonda, Adding Big Value to Big Businesses: A Present State of the Art of Big Data, Frameworks and Algorithms, 01 2018, pp. 171–184. [Online]. Available:
  • [6] M. J. Zaki, Parallel and Distributed Data Mining: An Introduction. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 1–23. [Online]. Available:
  • [7] W. Tom, Hadoop – The Definitive Guide 4th Edition. O’Reilly, 2015.
  • [8] V. Vasantham and D. Haritha, “A survey on cost minimization techniques for big data processing,” Journal of Advanced Research in Dynamical and Control Systems, vol. 10, pp. 547–551, 01 2018.
  • [9] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The hadoop distributed file system,” in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, pp. 1–10. [Online]. Available:
  • [10] K. Rao, S. Subramani, M. Prasad, and A. Saravanan, “Technical challenges and perspectives in batch and stream big data machine learning,” International Journal of Engineering and Technology, vol. 7, p. 48, 12 2017. [Online]. Available:
  • [11] E.-H. Han, G. Karypis, and V. Kumar, “Scalable parallel data mining for association rules,” SIGMOD Rec., vol. 26, no. 2, pp. 277–288, 1997. [Online]. Available:
  • [12] R. Grossman, S. Bailey, A. Ramu, B. Malhi, and A. Turinsky, “The preliminary design of papyrus: A system for high performance, distributed data mining over clusters,” Advances in Distributed and Parallel Knowledge Discovery, pp. 259–275, 2000.
  • [13] H. Liu, H. Lu, and J. Yao, “Toward multidatabase mining: identifying relevant databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 4, pp. 541–553, 2001. [Online]. Available:
  • [14] X. Wu, C. Zhang, and S. Zhang, “Database classification for multidatabase mining,” Inf. Syst., vol. 30, no. 1, pp. 71–88, 2005. [Online]. Available:
  • [15] S. Zhang, X. Wu, and C. Zhang, “Multi-database mining,” IEEE Computational Intelligence Bulletin, vol. 2, no. 1, pp. 5–13, 2003.
  • [16] Z. Chengqi, L. Meiling, N. Wenlong, and Z. Shichao, “Identifying global exceptional patterns in multi-database mining,” IEEE Intelligent Informatics Bulletin, vol. 3, no. 1, pp. 19–24, 2004.
  • [17] X. Wu and S. Zhang, “Synthesizing high-frequency rules from different data sources,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 353–367, 2003. [Online]. Available:
  • [18] T. Ramkumar and R. Srinivasan, “Modified algorithms for synthesizing high-frequency rules from different data sources,” Knowl. Inf. Syst., vol. 17, no. 3, pp. 313–334, 2008. [Online]. Available:
  • [19] R. Thirunavukkarasu, H. Shanmugasundaram, and S. Shanmugam, “Synthesizing global negative association rules in multi-database mining,” International Arab Journal of Information Technology, vol. 11, no. 6, pp. 526–531, 2014.
  • [20] A. Adhikari, L.C. Jain, and S. Ramanna, “Analysing effect of database grouping on multi-database mining,” IEEE Intelligent Informatics Bulletin, vol. 12, no. 1, pp. 25–32, 2011.
  • [21] L. Ning, Z. Li, H. Qing, and S. Zhongzhi, “Parallel implementation of apriori algorithm based on mapreduce,” International Journal of Networked and Distributed Computing, vol. 1, no. 2, pp. 89–96, 2013. [Online]. Available:
  • [22] X. Yang, Z. Liu, and Y. Fu, “Mapreduce as a programming model for association rules algorithm on hadoop,” in Proceedings of the 3rd International Conference on Information Sciences and Interaction Sciences, ser. ICIS ’10. IEEE, 2010, pp. 99–102. [Online]. Available:
  • [23] Z. Farzanyar and N. Cercone, “Efficient mining of frequent itemsets in social network data based on mapreduce framework,” in Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ser. ASONAM ’13. ACM, 2013, pp. 1183–1188. [Online]. Available:
  • [24] J. Dean and S. Ghemawat, “Mapreduce: A flexible data processing tool,” Commun. ACM, vol. 53, pp. 72–77, 2010. [Online]. Available:
  • [25] V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache hadoop yarn: Yet another resource negotiator,” in Proceedings of the 4th Annual Symposium on Cloud Computing, ser. SOCC ’13, 2013, pp. 1–16. [Online]. Available:
  • [26] A. Vishwanath and R. Murugan, “Parallel processing on big data in the context of machine learning and hadoop ecosystem: A survey,” International Journal of Engineering and Technology, vol. 7, p. 577, 03 2018. [Online]. Available:
Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.