PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Importance of Text Data Preprocessing & Implementation in RapidMiner

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
International Conference on Information Technology and Knowledge Management (1 ; 22-23.12.2017 ; New Delhi, India)
Języki publikacji
EN
Abstrakty
EN
Data preparation is an important phase before applying any machine learning algorithms. Same with the text data before applying any machine learning algorithm on text data, it requires data preparation. The data preparation is done by data preprocessing. The preprocessing of text means cleaning of noise such as: cleaning of stop words, punctuation, terms which doesn't carry much weightage in context to the text, etc. In this paper, we describe in detail how to prepare data for machine learning algorithms using RapidMiner tool. This preprocessing is followed by conversion of bag of words into term vector model and describe about the various algorithms which can be applied in RapidMiner for data analysis and predictive modeling. We also discussed about the challenges and applications of text mining in recent days.
Rocznik
Tom
Strony
71--75
Opis fizyczny
Bibliogr. 12 poz.,rys.
Twórcy
autor
  • Manav Rachna International Institute of Research & Studies, Faridabad
autor
  • Manav Rachna International Institute of Research & Studies, Faridabad
Bibliografia
  • 1. Charu C. Aggarwal and ChengXiang Zhai: Survey of Text Classification Algorithm, chapter in book “Mining Text Data” http://dx.doi.org/10.1007/978-1-4614-3223-4_6, pp 163-222-springer US 2012.
  • 2. S. B. Kotsiantis: Decion Trees: A recent Overview, article published in “Artificial Intelligence Review”, in April 2013, Volume 39, Issue 4, pp 261–283-springer.
  • 3. D. M. Farid, L. Zhang, C. M. Rahman, M. A. Hossain: Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks,- Expert Systems with Applications Volume 41, Issue 4, Part 2, March 2014, Pages 1937–1946– Elsevier.
  • 4. R Moraes, J. O. F Valiati, W. P. G. O. Neto: Document-levelsentiment classification: An empirical comparison between SVM and ANN, Expert Systems with Applications Volume 40, pp 621–633 2013 – Elsevier.
  • 5. Thorsten Joachims: Text categorization with Support Vector Machines: Learning with many relevant features, Support Vector Learning, Machine Learning: ECML-98,Volume 1398 of the series Lecture Notes in Computer Science pp 137-142.
  • 6. V Bijalwan, V Kumar, P Kumari, J Pascua: KNN based Machine Learning Approach for Text and Document Mining, International Journal of Database Theory and Application Vol.7, No.1 (2014), pp.61-70.
  • 7. A. Ittoo, L. M. Nguyen, A. van den Bosch: Text analytics in industry: challenges, desiderata and trends, Computers in Industry,Volume 78, May 2016, pp 96-107 - Elsevier.
  • 8. Gary Miner, John Elder: Practical text mining and statistical analysis of text mining, IV, Thomas Hill, ist edition, ISBN-978-0-386979-1, 2012 - books.google.com.
  • 9. Li-Ping Jing, Hou-kuan, Hong Boshi: Improved feature selection approach using TF-IDF in Text Mining, in Proceedings of the first Internationl conference on Machine Learning and cybermetics, Bejing, pp-944 to 946, 4-5 November 2002-IEEE.
  • 10. Charu C. Aggarwal and ChengXiang Zhai: A Survey of Text Clustering Algorithms chapter in book “Mining Text Data”, http://dx.doi.org/10.1007/978-1-4614-3223-4_6, pp 77-128 springer US 2012.
  • 11. Rashmi Agrawal: K-Nearest Neighbor for Uncertain Data, in International Journal of Computer Applications (0975 – 8887) Volume 105 – No. 11, November 2014.
  • 12. Rashmi Agrawal, Mridula Batra: A Detailed Study on Text Mining Techniques, in International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-6, January 2013.
Uwagi
1. Preface
2. Technical Session: First International Conference on Information Technology and Knowledge Management
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-34ae23e1-deba-4c49-b19b-b8bc50a9e4f2
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.