Multiaspect Text Categorization Problem Solving : a Nearest Neighbours Classifier Based Approaches and Beyond

Zadrożny, S.; Kacprzyk, J.; Gajewski, M.

doi:10.14313/JAMRIS_4-2015/34

Artykuł - szczegóły

Tytuł artykułu

Multiaspect Text Categorization Problem Solving : a Nearest Neighbours Classifier Based Approaches and Beyond

Autorzy

Zadrożny S. , Kacprzyk J. , Gajewski M.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.14313/JAMRIS_4-2015/34

Warianty tytułu

Języki publikacji

Abstrakty

We deal with the problem of the multiaspect text categorization which calls for the classification of the documents with respect to two, in a sense, orthogonal sets of categories. We briefly define the problem, mainly referring to our previous work, and study the application of the k- nearest neighbours algorithm. We propose a new technique meant to enhance the effectiveness of this algorithm when applied to the problem in question. We show some experimental results confirming usefulness of the proposed approach.

Słowa kluczowe

text categorization intelligent system nearest neighbours classifiers topic tracking and detection fuzzy majority

Wydawca

Łukasiewicz Industrial Research Institute for Automation and Measurements PIAP

Czasopismo

Journal of Automation Mobile Robotics and Intelligent Systems

Rocznik

2015

Tom

Vol. 9, No. 4

Strony

58--70

Opis fizyczny

Bibliogr. 25 poz., rys.

Twórcy

autor

Zadrożny S.

Slawomir. Zadrozny@ibspan.waw.pl

Systems Research Institute, Polish Academy of Sciences, 01-447 Warszawa, ul. Newelska 6, Poland

autor

Kacprzyk J.

Janusz.Kacprzyk@ibspan.waw.pl

Systems Research Institute, Polish Academy of Sciences, 01-447 Warszawa, ul. Newelska 6, Poland

autor

Gajewski M.

gajewskm@ibspan.waw.pl

Systems Research Institute, Polish Academy of Sciences, 01-447 Warszawa, ul. Newelska 6, Poland

Bibliografia

[1] J. Allan, ed., Topic Detection and Tracking: Eventbased Information, Kluwer Academic Publishers, 2002.
[2] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval, ACM Press and Addison Wesley, 1999.
[3] A. Beygelzimer, S. Kakadet, J. Langford, S. Arya, D. Mount, and S. Li. FNN: Fast Nearest Neighbor Search Algorithms and Applications, 2013. R package version 1.1.
[4] S. Bird, R. Dale, B. Dorr, B. Gibson, M. Joseph, M.-Y. Kan, D. Lee, B. Powley, D. Radev, and Y. Tan, “The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics”. In: Proc. of Language Resources and Evaluation Conference (LREC 08), Marrakesh, Morocco, 1755–1759.
[5] M. Delgado, M. D. Ruiz, D. Sánchez, and M. A. Vila, “Fuzzy quantification: a state of the art”, Fuzzy Sets and Systems, vol. 242, 2014, 1–30, http://dx.doi.org/10.1016/j.fss.2013.10.012.
[6] S. A. Dudani, “The distance-weighted knearest-neighbor rule”, IEEE Transactions on Systems, Man, and Cybernetics, vol. 6, no. 4, 1976, 325–327, http: //dx.doi.org/10.1109/TSMC.1976.5408784.
[7] I. Feinerer, K. Hornik, and D. Meyer, “Text mining infrastructure in R”, Journal of Statistical Software, vol. 25, no. 5, 2008, 1–54, http://dx.doi.org/10.18637/jss.v025.i05.
[8] A. Feng and J. Allan, “Hierarchical topic detection in tdt-2004”.
[9] M. Gajewski, J. Kacprzyk, and S. Zadrożny, “Topic detection and tracking: a focused survey and a new variant”, Informatyka Stosowana, to appear.
[10] E. Han, G. Karypis, and V. Kumar, “Text categorization using weight adjusted k-nearest neighbor classifiication”. In: D. W. Cheung, G. J. Williams, and Q. Li, eds., Knowledge Discovery and Data Mining - PAKDD 2001, 5th Pacifiic-Asia Conference, Hong Kong, China, April 16-18, 2001, Proceedings, vol. 2035, 2001, 53–65.
[11] J. Kacprzyk, J. W. Owsiński, and D. A. Viattchenin, “A new heuristic possibilistic clustering algorithm for feature selection”, Journal of Automation, Mobile Robotics & Intelligent Systems, vol. 8, no. 2, 2014, http://dx.doi.org/10.14313/JAMRIS_2-2014/18.
[12] J. Kacprzyk and S. Zadrożny. “Power of linguistic data summaries and their protoforms”. In: C. Kahraman, ed., Computational Intelligence Systems in Industrial Engineering, volume 6 of Atlantis Computational Intelligence Systems, 71–90. Atlantis Press, 2012. http://dx.doi.org/10.2991/978-94-91216-77-0_4.
[13] D. Olszewski, J. Kacprzyk, and S. Zadrożny. “Time series visualization using asymmetric selforganizing map”. In: M. Tomassini, A. Antonioni, F. Daolio, and P. Buesser, eds., Adaptive and Natural Computing Algorithms, volume 7824 of Lecture Notes in Computer Science, 40–49. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37213-1_5.
[14] D. Olszewski, J. Kacprzyk, and S. Zadrożny. “Asymmetric k-means clustering of the asymmetric self-organizing map”. In: L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L. Zadeh, and J. Zurada, eds., Artifiicial Intelligence and Soft Computing, volume 8468 of Lecture Notes in Computer Science, 772–783. Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-07176-3_67.
[15] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014.
[16] F. Sebastiani, “Machine learning in automated text categorization”, ACM Computing Survys, vol. 34, no. 1, 2002, 1–47, http://dx.doi.org/10. 1145/505282.505283.
[17] M. Szymczak, S. Zadrożny, A. Bronselaer, and G. D. Tré, “Coreference detection in an XML schema”, Information Sciences, vol. 296, 2015, 237 – 262, http://dx.doi.org/10.1016/j.ins.2014.11.002.
[18] R. Yager, “Quantifiier guided aggregation using OWA operators”, International Journal of Intelligent Systems, vol. 11, 1996, 49–73, http://dx.doi.org/10.1002/(SICI)1098-111X(199601)11:1%3C49::AID-INT3%3E3.0.CO;2-Z.
[19] Y. Yang, “An evaluation of statistical approaches to text categorization”, Information Retrieval, vol. 1, no. 1-2, 1999, 69–90, http://dx.doi.org/ 10.1023/A:1009982220290.
[20] Y. Yang, T. Ault, T. Pierce, and C. W. Lattimer, “Improving text categorization methods for event tracking”. In: SIGIR, 2000, 65–72, http://dx. doi.org/10.1145/345508.345550.
[21] L. Zadeh, “A computational approach to fuzzy quantifiiers in natural languages”, Computers and Mathematics with Applications, vol. 9, 1983, 149–184, http://dx.doi.org/10.1016/0898-1221(83)90013-5.
[22] S. Zadrożny, J. Kacprzyk, M. Gajewski, and M. Wysocki, “A novel text classifiication problem and its solution”, Technical Transaction. Automatic Control, vol. 4-AC, 2013, 7–16.
[23] S. Zadrożny, J. Kacprzyk, and M. Gajewski, “A novel approach to sequence-of-documents focused text categorization using the concept of a degree of fuzzy set subsethood”. In: Proceedings of the Annual Conference of the North American Fuzzy Information processing Society NAFIPS’2015 and 5th World Conference on Soft Computing 2015, Redmond, WA, USA, August 17-19, 2015, 2015.
[24] S. Zadrożny, J. Kacprzyk, and M. Gajewski. “A new approach to the multiaspect text categorization by using the support vector machines”. In: G. De Tré, P. Grzegorzewski, J. Kacprzyk, J. W. Owsiński, W. Penczek, and S. Zadrożny, eds., Challenging problems and solutions in intelligent systems, to appear. Springer, Heidelberg New York, 2016.
[25] S. Zadrożny, J. Kacprzyk, and M. Gajewski, “A new two-stage approach to the multiaspect text categorization”. In: 2015 IEEE Symposium on Computational Intelligence for Human-like Intelligence, CIHLI 2015, Cape Town, South Africa, December 8-10, 2015, to appear, 2015.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-80c3755d-936b-4188-b288-81c4a01bce20