Metadata based Text Mining for Generation of Side Information

Bhanuse, Shraddha S.; Kamble, Shailesh D.; Thakur, Nileshsingh V.; Patharkar, Akshay S.

doi:10.15439/2017R86

Powiadomienia systemowe

Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Metadata based Text Mining for Generation of Side Information

Autorzy

Bhanuse Shraddha S. , Kamble Shailesh D. , Thakur Nileshsingh V. , Patharkar Akshay S.

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2017R86

Warianty tytułu

Konferencja

The Second International Conference on Research in Intelligent and Computing in Engineering

Języki publikacji

Abstrakty

Text mining is knowledge analyzing technique to find a pattern. The side information is also called as metadata in most of the metadata based text mining applications. The side information consisting of large data in terms of weblogs, metadata, and non-textual data i.e. image/video, etc. This large data present in the unprocessed form which cannot be used for further text mining. Therefore, metadata based text mining algorithms are used to mine the useful information. In this paper, the proposed approach uses the different kind of pre-processing steps i.e. splitting, tokenize, steaming, parsing and chunking. For generating the side information i.e. title, name, affiliation, email address, place etc. a natural language processing (NLP) is used. To achieve the effective clustering, the proposed approach uses a classical partitioning method with a probabilistic model. The proposed approach is compared in terms of time required for mining of words, accuracy, and efficiency. The presented result shows that, the proposed approach performs better in terms of accuracy and running time. In future, a Security is provided for metadata based side information generation using Intrusion Detection System (IDS).

Słowa kluczowe

text mining metadata side information natural language processing classical partitioning clustering

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2017

Tom

Vol. 10

Strony

135--141

Opis fizyczny

Bibliogr. 23 poz., rys., wykr.

Twórcy

autor

Bhanuse Shraddha S.

shradha.bhanuse@gmail.com

Mobisoft Infotech, India

autor

Kamble Shailesh D.

shailesh_2kin@rediffmail.com

Computer Science & Engineering, Yeshwantrao Chavan College of Engineering, India

autor

Thakur Nileshsingh V.

thakurnisvis@rediffmail.com

Computer Science & Engineering, Prof Ram Meghe College of Engineering & Management, India

autor

Patharkar Akshay S.

akshay.patharkar7@gmail.com

4Computer Technology, K.D.K. College of Engineering, India

Bibliografia

1. S. Bhanuse, S. Kamble and S. Kakde, “Text Mining using Metadata for Generation of Side information”, Procedia Computer Science, vol. 78, pp. 807-814, 2016.
2. C. Aggarwal and P. Yu, “A framework for clustering massive text and categorical data streams”, international Conference on Data. Mining, pp. 477-481, 2006.
3. S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases”, in Proc. ACM SIGMOD Conf.., New York, NY, USA, 1998, pp. 73-84, 1998.
4. S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical attributes”, info. Syst., vol. 25, no. 5, pp. 345-366, 2000.
5. T. Liu, S. Liu, Z. Chen, and W.-Y. Ma, “An evaluation of feature selection for text clustering”, in Proc. ICML Conf.., Washington, DC, USA, 2003, pp. 488-495.
6. C. Aggarwal, Yuchen Zhao, and Philip S. Yu, “On the Use of Side Information for Mining Text Data”, IEEE Transactions on knowledge and data engineering, vol. 26, ,no.6, pp. 1415-1429,2014.
7. C. Aggarwal and H. Wang, “Managing and Mining Graph Data”, New York, NY, USA: Springer, 2010.
8. C. Aggarwal and C. Zhai, “A survey of text classification algorithms”, in Mining Text Data. New York, NY, USA: Springer,2012.
9. T. Yang, R. Jin, Y. Chi, and S. Zhu, “Combining link and content for community detection: A discriminative approach”, in Proc. 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 927-936, 2009.
10. C. Aggarwal, Y. Zhao, and P. Yu, “On the Use of Side Information for Mining Text Data”, IEEE Transactions on knowledge and data engineering vol. 26,no.6, pp. 1415-1429,2014.
11. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc., ACM SIGMOD Conf., New York, NY, USA, pp. 103-114, 1996.
12. M. Khatri, S. Dhande “Implementation with text mining using classification”, International Journal for Technological Research In Engineering, vol. 2, Issue 10, June-20.
13. M. Steinbach, G. Karypis and V. Kumar, “A comparison of document clustering techniques,” in Proc. Text Mining Workshop KDD, pp. 109-115, 2000.
14. R. Feldman, J. Sanger “The Text Mining Handbook”, Cambridge University Press, 2007.
15. H. Mahgoub, and D. Rösner, “Mining association rules from unstructured documents,” in Proc. 3rdInt. Conf. on Knowledge Mining, ICKM, Prague, Czech Republic, Aug. 25-27, pp.167-172, 2006.
16. A. McCallum. “Bow: A Toolkit for Statistical Language Modeling, Text Retrieval”, Classification and Clustering, 1996, http://www.cs.cmu.edu/~mccallum/bow/
17. R. Angelova and S. Siersdorfer, “A neighborhood-based approach for clustering of linked document collections,” in Proc. CIKM Conf., New York, NY, USA, pp. 778-779, 2006.
18. A. Jain and R. Dubes, “Algorithms for Clustering Data”, Englewood Cliffs, NJ, USA: Prentice-Hall, Inc., 1988.
19. I. Dhillon. S. Mallela, and D. Modha, “Information-theoretic Co-clustering,” In Proc. ACM KDD Conf., New York, NY, USA, pp. 89-98, 2003.
20. A. Banerjee and S. Basu, “Topic models over text streams: A study of batch and online unsupervised learning,” In Proc. SDM Conf., pp. 437-442, 2007.
21. Y. Sun, J. Han, J. Gao, and Y. Yu, “Topic Model: Information network integrated topic modeling,” In Proc. ICDM Conf., Miami, FL, USA, pp. 493-502, 2009.
22. Ning Zhong, Yuefeng Li, and Sheng-Tang Wu,” Effective Pattern Discovery for Text Mining”, IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 1,2012.
23. M. Franz, T. Ward, J. S. McCarley, and W. J. Zhu, “Unsupervised and supervised clustering for topic tracking”, In Proc. ACM SIGIR Conf., New York, NY, USA, pp. 310-317,2001.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-ae168d68-d46e-40e0-98f2-db9d2849662f