Set representation for rule-generation algorithms

Kharkongor, Carynthia; Nath, Bhabesh

doi:10.7494/csci.2022.23.2.4071

Artykuł - szczegóły

Tytuł artykułu

Set representation for rule-generation algorithms

Autorzy

Kharkongor Carynthia , Nath Bhabesh

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2022.23.2.4071

Warianty tytułu

Języki publikacji

Abstrakty

The task of mining association rules has become one of the most widely used discovery pattern methods in knowledge discovery in databases (KDD). One such task is to represent an item set in the memory. The representation of the item set largely depends on the type of data structure that is used for storing them. Computing the process of mining an association rule impacts the memory and time requirements of the item set. With the constant increase of the dimensionality of data and data sets, mining such a large volume of data sets will be difficult since all of these item sets cannot be placed in the main memory. As the representation of an item set greatly affects the efficiency of the rule-mining association, a compact and compressed representation of the item set is needed. In this paper, a set representation is introduced that is more memory- and cost-efficient. Bitmap representation takes 1 byte for an element, but a set representation uses 1 bit. The set representation is being incorporated in the Apriori algorithm. Set representation is also being tested for different rule-generation algorithms. The complexities of these different rule-generation algorithms that use set representation are being compared in terms of memory and time of execution.

Słowa kluczowe

item set item set representation apriori algorithm rule-generation algorithm data set set representation bitmap memory time

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2022

Tom

T. 23 (2)

Strony

205--225

Opis fizyczny

Bibliogr. 43 poz., rys., tab.

Twórcy

autor

Kharkongor Carynthia

caryn@tezu.ernet.in

Tezpur University, Assam, India

autor

Nath Bhabesh

bnath@tezu.ernet.in

Tezpur University, Assam, India

Bibliografia

[1] Agrawal R., Imieli´nski T., Swami A.: Mining association rules between sets of items in large databases, ACM SIGMOD Record, vol. 22(2), pp. 207–216, 1993.
[2] Agrawal R., Mannila H., Srikant R., Toivonen H., Verkamo A.I.: Fast discovery of association rules, pp. 307–328, American Association for Artificial Intelligence, 1996.
[3] Agrawal R., Srikant R.: Fast algorithms for mining association rules. In: VLDB’94: Proceedings of the 20th International Conference on Very Large Data Bases, vol. 1215, pp. 487–499, 1994.
[4] Al-Maolegi M., Arkok B.: An improved Apriori algorithm for association rules, arXiv preprint arXiv:14033948, 2014.
[5] Antoshenkov G.: Byte-aligned bitmap compression. In: Proceedings DCC’95 Data Compression Conference, p. 476, IEEE, 1995.
[6] Apiletti D., Baralis E., Cerquitelli T., Garza P., Pulvirenti F., Michiardi P.: A parallel MapReduce algorithm to efficiently support itemset mining on high dimensional data, Big Data Research, vol. 10, pp. 53–69, 2017.
[7] Apiletti D., Baralis E., Cerquitelli T., Garza P., Pulvirenti F., Venturini L.: Frequent itemsets mining for big data: a comparative analysis, Big Data Research, vol. 9, pp. 67–83, 2017.
[8] Aridhi S., Nguifo E.M.: Big graph mining: Frameworks and techniques, Big Data Research, vol. 6, pp. 1–10, 2016.
[9] Ayres J., Gehrke J., Yiu T., Flannick J.: Sequential pattern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429–435, 2002.
[10] Barati M., Bai Q., Liu Q.: Mining semantic association rules from RDF data, Knowledge-Based Systems, vol. 133, pp. 183–196, 2017.
[11] Bittmann R.M., Nemery P., Shi X., Kemelmakher M., Wang M.: Frequent Itemset Mining without Ubiquitous Items, arXiv preprint arXiv:180311105, 2018.
[12] Bodon F.: A trie-based APRIORI implementation for mining frequent item sequences. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, pp. 56–65, ACM, 2005. Set representation for rule-generation algorithms 223
[13] Cheung D.W., Xiao Y.: Effect of data skewness in parallel mining of association rules. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 48–60, Springer, 1998.
[14] Chuang K.T., Chen M.S., Yang W.C.: Progressive sampling for association rules based on sampling error estimation. In: Pacific-Asia conference on knowledge discovery and data mining, pp. 505–515, Springer, 2005.
[15] Czibula G., Czibula I.G., Miholca D.L., Crivei L.M.: A novel concurrent relational association rule mining approach, Expert Systems with Applications, vol. 125, pp. 142–156, 2019.
[16] Do T.D., Hui S.C., Fong A.: Mining frequent itemsets with category-basedconstraints. In: International Conference on Discovery Science, pp. 76–86, Springer, 2003.
[17] Gao F., Khandelwal A., Liu J.: Mining Frequent Itemsets Using Improved Apriori on Spark. In: Proceedings of the 2019 3rd International Conference on Information System and Data Mining, pp. 87–91, 2019.
[18] Ghosh A., Nath B.: Multi-objective rule mining using genetic algorithms, Information Sciences, vol. 163(1-3), pp. 123–133, 2004.
[19] Grahne G., Zhu J.: Efficiently using prefix-trees in mining frequent itemsets. In: FIMI, vol. 90, 2003.
[20] Han J., Pei J., Yin Y.: Mining frequent patterns without candidate generation. In: ACM Sigmod Record, vol. 29, pp. 1–12, ACM, 2000.
[21] Jin X., Wah B.W., Cheng X., Wang Y.: Significance and challenges of big data research, Big Data Research, vol. 2(2), pp. 59–64, 2015.
[22] Kharkongor C.: Datasets – Google Drive, 2021. https://drive.google.com/drive/ folders/11iOaDy5UYgiQJOQe6acSDNrpz1Og-Lgt?usp=sharing.
[23] Kharkongor C., Nath B.: Set Representation for Itemsets in Association Rule Mining. In: 2018 IEEE International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE, 2018.
[24] Liang Y., Duan X., Ding Y., Kou X., Huang J.: Data Mining of Students’ Course Selection Based on Currency Rules and Decision Tree. In: Proceedings of the 2019 4th International Conference on Big Data and Computing, pp. 247–252, 2019.
[25] Louie E., Young T.: Finding association rules using fast bit computation: Machine-oriented modeling. In: International Symposium on Methodologies for Intelligent Systems, pp. 486–494, Springer, 2000.
[26] Mehlhorn K., Sanders P.: Algorithms and data structures: The basic toolbox, Springer Science & Business Media, 2008.
[27] Nguyen T.T.: Mining incrementally closed item sets with constructive pattern set, Expert Systems with Applications, vol. 100, pp. 41–67, 2018.
[28] Ortin F., Garcia M., McSweeney S.: Rule-based program specialization to optimize gradually typed code, Knowledge-Based Systems, vol. 179, pp. 145–173, 2019. 224 Carynthia Kharkongor, Bhabesh Nath
[29] Pei J., Han J., Lu H., Nishio S., Tang S., Yang D.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 441–448, IEEE, 2001.
[30] Phan H.: NOV-CFI: a novel algorithm for closed frequent itemsets mining in transactional databases. In: Proceedings of the 2018 VII International Conference on Network, Communication and Computing, pp. 58–63, 2018.
[31] Schuster A., Wolff R.: Communication-efficient distributed mining of association rules. In: ACM Sigmod Record, vol. 30, pp. 473–484, ACM, 2001.
[32] Serrano D., Antunes C.: Condensed representation of frequent itemsets. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 168–175, ACM, 2014.
[33] Shao Y., Liu B., Wang S., Li G.: A novel software defect prediction based on atomic class-association rule mining, Expert Systems with Applications, vol. 114, pp. 237–254, 2018.
[34] Shenoy P., Haritsa J.R., Sudarshan S., Bhalotia G., Bawa M., Shah D.: Turbocharging vertical mining of large databases. In: ACM Sigmod Record, vol. 29, pp. 22–33, ACM, 2000.
[35] Sodanil M., Chotirat S., Poomhiran L., Viriyapant K.: Guideline for Academic Support of Student Career Path Using Mining Algorithm. In: Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval, pp. 133–137, 2019.
[36] Song M., Rajasekaran S.: A transaction mapping algorithm for frequent itemsets mining, IEEE Transactions on Knowledge and Data Engineering, vol. 18(4), pp. 472–481, 2006.
[37] Song S., Hu H., Jin S.: HVSM: a new sequential pattern mining algorithm using bitmap representation. In: International conference on advanced data mining and applications, pp. 455–463, Springer, 2005.
[38] Srikant R., Agrawal R.: Mining generalized association rules, 1995.
[39] Srikant R., Agrawal R.: Mining sequential patterns: Generalizations and performance improvements. In: International Conference on Extending Database Technology, pp. 1–17, Springer, 1996.
[40] Toivonen H.: Sampling large databases for association rules. In: VLDB, vol. 96, pp. 134–145, 1996.
[41] Uno T., Kiyomi M., Arimura H.: LCM ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, pp. 77–86, ACM, 2005.
[42] V¨ocking B., Alt H., Dietzfelbinger M., Reischuk R., Scheideler C., Vollmer H., Wagner D.: Algorithms unplugged, Springer Science & Business Media, 2010.
[43] Wu K., Otoo E.J., Shoshani A.: Optimizing bitmap indices with efficient compression, ACM Transactions on Database Systems (TODS), vol. 31(1), pp. 1–38, 2006.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-72d3576b-162c-4d15-ba26-20b2af9bc0e5