PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Metadata based Text Mining for Generation of Side Information

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
The Second International Conference on Research in Intelligent and Computing in Engineering
Języki publikacji
EN
Abstrakty
EN
Text mining is knowledge analyzing technique to find a pattern. The side information is also called as metadata in most of the metadata based text mining applications. The side information consisting of large data in terms of weblogs, metadata, and non-textual data i.e. image/video, etc. This large data present in the unprocessed form which cannot be used for further text mining. Therefore, metadata based text mining algorithms are used to mine the useful information. In this paper, the proposed approach uses the different kind of pre-processing steps i.e. splitting, tokenize, steaming, parsing and chunking. For generating the side information i.e. title, name, affiliation, email address, place etc. a natural language processing (NLP) is used. To achieve the effective clustering, the proposed approach uses a classical partitioning method with a probabilistic model. The proposed approach is compared in terms of time required for mining of words, accuracy, and efficiency. The presented result shows that, the proposed approach performs better in terms of accuracy and running time. In future, a Security is provided for metadata based side information generation using Intrusion Detection System (IDS).
Rocznik
Tom
Strony
135--141
Opis fizyczny
Bibliogr. 23 poz., rys., wykr.
Twórcy
  • Mobisoft Infotech, India
  • Computer Science & Engineering, Yeshwantrao Chavan College of Engineering, India
  • Computer Science & Engineering, Prof Ram Meghe College of Engineering & Management, India
  • 4Computer Technology, K.D.K. College of Engineering, India
Bibliografia
  • 1. S. Bhanuse, S. Kamble and S. Kakde, “Text Mining using Metadata for Generation of Side information”, Procedia Computer Science, vol. 78, pp. 807-814, 2016.
  • 2. C. Aggarwal and P. Yu, “A framework for clustering massive text and categorical data streams”, international Conference on Data. Mining, pp. 477-481, 2006.
  • 3. S. Guha, R. Rastogi, and K. Shim, “CURE: An efficient clustering algorithm for large databases”, in Proc. ACM SIGMOD Conf.., New York, NY, USA, 1998, pp. 73-84, 1998.
  • 4. S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical attributes”, info. Syst., vol. 25, no. 5, pp. 345-366, 2000.
  • 5. T. Liu, S. Liu, Z. Chen, and W.-Y. Ma, “An evaluation of feature selection for text clustering”, in Proc. ICML Conf.., Washington, DC, USA, 2003, pp. 488-495.
  • 6. C. Aggarwal, Yuchen Zhao, and Philip S. Yu, “On the Use of Side Information for Mining Text Data”, IEEE Transactions on knowledge and data engineering, vol. 26, ,no.6, pp. 1415-1429,2014.
  • 7. C. Aggarwal and H. Wang, “Managing and Mining Graph Data”, New York, NY, USA: Springer, 2010.
  • 8. C. Aggarwal and C. Zhai, “A survey of text classification algorithms”, in Mining Text Data. New York, NY, USA: Springer,2012.
  • 9. T. Yang, R. Jin, Y. Chi, and S. Zhu, “Combining link and content for community detection: A discriminative approach”, in Proc. 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 927-936, 2009.
  • 10. C. Aggarwal, Y. Zhao, and P. Yu, “On the Use of Side Information for Mining Text Data”, IEEE Transactions on knowledge and data engineering vol. 26,no.6, pp. 1415-1429,2014.
  • 11. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc., ACM SIGMOD Conf., New York, NY, USA, pp. 103-114, 1996.
  • 12. M. Khatri, S. Dhande “Implementation with text mining using classification”, International Journal for Technological Research In Engineering, vol. 2, Issue 10, June-20.
  • 13. M. Steinbach, G. Karypis and V. Kumar, “A comparison of document clustering techniques,” in Proc. Text Mining Workshop KDD, pp. 109-115, 2000.
  • 14. R. Feldman, J. Sanger “The Text Mining Handbook”, Cambridge University Press, 2007.
  • 15. H. Mahgoub, and D. Rösner, “Mining association rules from unstructured documents,” in Proc. 3rdInt. Conf. on Knowledge Mining, ICKM, Prague, Czech Republic, Aug. 25-27, pp.167-172, 2006.
  • 16. A. McCallum. “Bow: A Toolkit for Statistical Language Modeling, Text Retrieval”, Classification and Clustering, 1996, http://www.cs.cmu.edu/~mccallum/bow/
  • 17. R. Angelova and S. Siersdorfer, “A neighborhood-based approach for clustering of linked document collections,” in Proc. CIKM Conf., New York, NY, USA, pp. 778-779, 2006.
  • 18. A. Jain and R. Dubes, “Algorithms for Clustering Data”, Englewood Cliffs, NJ, USA: Prentice-Hall, Inc., 1988.
  • 19. I. Dhillon. S. Mallela, and D. Modha, “Information-theoretic Co-clustering,” In Proc. ACM KDD Conf., New York, NY, USA, pp. 89-98, 2003.
  • 20. A. Banerjee and S. Basu, “Topic models over text streams: A study of batch and online unsupervised learning,” In Proc. SDM Conf., pp. 437-442, 2007.
  • 21. Y. Sun, J. Han, J. Gao, and Y. Yu, “Topic Model: Information network integrated topic modeling,” In Proc. ICDM Conf., Miami, FL, USA, pp. 493-502, 2009.
  • 22. Ning Zhong, Yuefeng Li, and Sheng-Tang Wu,” Effective Pattern Discovery for Text Mining”, IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 1,2012.
  • 23. M. Franz, T. Ward, J. S. McCarley, and W. J. Zhu, “Unsupervised and supervised clustering for topic tracking”, In Proc. ACM SIGIR Conf., New York, NY, USA, pp. 310-317,2001.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-ae168d68-d46e-40e0-98f2-db9d2849662f
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.