PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Identification of keywords for legal documents categories using SOM

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This study aims to use the decision-making process to categorize legal documents by identifying keywords characterizing each legal domain class. The study utilizes the Kohonen Self-Organizing Map method and the Global Vectors for Word Representation (GloVe) model to create an efficient document classification system. As a result, a satisfactory classification accuracy of 71.69% was achieved. The article also discusses alternative approaches implemented to improve classification accuracy, such as the use of Named Entity Recognizer (NER) tools and the RoBERTa model, along with a comparison of these approaches’ effectiveness. Challenges related to the uneven distribution of categories in the dataset are also mentioned, and potential directions for further research to enhance the classification results of legal documents are presented.
Słowa kluczowe
Twórcy
  • Gdańsk University of Technol‐ ogy, Gabriela Narutowicza 11/12, 80‐233 Gdańsk, Poland
  • Gdańsk University of Technol‐ ogy, Gabriela Narutowicza 11/12, 80‐233 Gdańsk, Poland
  • Gdańsk University of Technol‐ ogy, Gabriela Narutowicza 11/12, 80‐233 Gdańsk, Poland
  • Częstochowa University of Technology, Generała Jana Henryka Dąbrowskiego 69, 42‐201 Częstochowa, Poland, www: https://kisi.pcz.pl/rscherer
  • University of Warmia and Mazury in Olsztyn, Michała Oczapowskiego 2, 10-718 Olsztyn, Poland, www: wmii.uwm.edu.pl/kadra/drozda-pawel
  • Lex Secure, Niepodległości 723/6, 81-853 Sopot, Poland, https://lexsecure.pl/
  • Lex Secure, Niepodległości 723/6, 81-853 Sopot, Poland, https://lexsecure.pl/
  • Gdańsk University of Technology, Gabriela Narutowicza 11/12, 80‐233 Gdańsk, Poland, https://pg.edu.pl/p/andrzej‐sobecki‐64426
  • Gdańsk University of Technology, Gabriela Narutowicza 11/12, 80‐233 Gdańsk, Poland
  • Gdańsk University of Technology, Gabriela Narutowicza 11/12, 80‐233 Gdańsk, Poland, https://julian.eti.pg.edu.pl/
Bibliografia
  • [1] K. Bennani-Smires, C. Musat, A. Hossmann, M. Baeriswyl, and M. Jaggi, “Simple Unsupervised Keyphrase Extraction using Sentence Embeddings”, arXiv e-prints, vol. 1, 2018, 1–9, doi: 10.48550/arXiv.1801.04470.
  • [2] T. Y. Christyawan and W. Firdaus Mahmudy, “Text Classification and Visualization on News Title Using Self Organizing Map”, 2018 International Conference on Sustainable Information Engineering and Technology (SIET), 2018, doi: 10.1109/SIET.2018.8693189.
  • [3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, CoRR, 2018, doi: 10.48550/arXiv.1810.04805.
  • [4] R. Dzisevič and D. Šešok, “Text Classification using Different Feature Extraction Approaches”, 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream), 2019, doi: 10.1109/eStream.2019.8732167.
  • [5] M. R. Faisal, I. Budiman, F. Abadi, D. T. Nugrahadi, M. Haekal, and I. Sutedja, “Applying Features Based on Word Embedding Techniques to 1D CNN for Natural Disaster Messages Classification”, 2022 5th International Conference of Computer and Informatics Engineering (IC2IE), 2022, doi: 10.1109/IC2IE56416.2022.9970188.
  • [6] E. Frank and R. R. Bouckaert, “Naive Bayes for Text Classification with Unbalanced Classes”. In: Knowledge Discovery in Databases: PKDD 2006: 10th European Conference on Principles and Pracice of Knowledge Discovery in Databases Berlin, Germany, September 18-22, 2006 Proceedings 10, vol. 1, 2006, 503–510.
  • [7] S. X. Gao Zhengjie, Feng Ao and W. Xi, “Target-Dependent Sentiment Classification With BERT”, IEEE Access, 2019, doi: 10.1109/ACCESS.2019.2946594.
  • [8] F. Heimerl, S. Lohmann, S. Lange, and T. Ertl, “Word Cloud Explorer: Text Analytics Based on Word Clouds”, 2014 47th Hawaii International Conference on System Sciences, 2014, doi: 10.1109/HICSS.2014.231.
  • [9] R. Jing, “A Self-attention Based LSTM Network for Text Classification”, IOP Publishing, 2019, doi: 10.1088/1742-6596/1207/1/012008.
  • [10] S.-B. Kim, K.-S. Han, H.-C. Rim, and S. H. Myaeng, “Some Effective Techniques for Naive Bayes Text Classification”, IEEE transactions on knowledge and data engineering, vol. 18, no. 11, 2006, 1457–1466.
  • [11] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, D. Brown, L. Id, and Barnes, “Text Classification Algorithms: A Survey”, Information (Switzerland), 2019, doi: 10.3390/info10040150.
  • [12] P. S. Navada Arundhati, Ansari Aamir Nizam and S. Balwant, “Overview of use of decision treealgorithms in machine learning”, 2011 IEEE Control and System Graduate Research Colloquium, 2011, doi: 10.1109/ICSGRC.2011.5991826.
  • [13] M. Osowski, K. Lorenc, P. Drozda, R. Scherer, K. Szałapak, K. Komar-Komarowski, J. Szymański, and A. Sobecki, “Previous Opinions is All You Need—Legal Information Retrieval System”. In: International Conference on Computational Collective Intelligence, 2023, 57–67.
  • [14] S. Rose, D. Engel, N. Cramer, and W. Cowley, “Automatic Keyword Extraction from Individualp Documents”, Text Mining: Applications and Theory, 2010, doi: 10.1002/9780470689646.ch1.
  • [15] F. P. Shah and V. Patel, “A Review on Feature Selection and Feature Extraction for Text Classification”, 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2016, doi: 10.1109/WiSPNET.2016.7566545.
  • [16] A. Sun, E.-P. Lim, and Y. Liu, “On Strategies for Imbalanced Text Classification Using SVM: A Comparative study”, Decision Support Systems, vol. 48, no. 1, 2009, 191–201.
  • [17] A. Talun, P. Drozda, L. Bukowski, and R. Scherer, “FastText and XGBoost Content-Based Classification for Employment Web Scraping”. In: International Conference on Artificial Intelligence and Soft Computing, 2020, 435–444.
  • [18] J. W. Xuelian Deng, Yuqing Li and J. Zhang, “Feature Selection for Text Classification: A Review”, Multimedia Tools and Applications, 2019, doi: 10.1007/s11042-018-6083-5.
  • [19] L. Yang, “A Brief Introduction of the Text Classification Methods”, 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), 2022, doi: 10.1109/EEBDA53927.2022.9744845.
  • [20] G. Yenduri, M. Ramalingam, G. ChemmalarSelvi, Y. Supriya, G. Srivastava, P. K. R. Maddikunta, G. DeeptiRaj, R. H. Jhaveri, B. Prabadevi, W. Wang, A. V. Vasilakos, and T. R. Gadekallu, “Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions”, ArXiv, 2023, doi: 10.48550/arXiv.2305.10435.
  • [21] P. L. Ying Chen and C. P. Teo, “Regularised Text Logistic Regression: Key Word Detection and Sentiment Classification for Online Reviews”, arXiv e-prints, 2020, doi: 10.48550/arXiv.2009.04591.
  • [22] Y. Zhang, “Research on Text Classification Method Based on LSTM Neural Network Model”, 2021 IEEE Asia-Paciϔic Conference on Image Processing, Electronics and Computers (IPEC), 2021, doi: 10.1109/IPEC51340.2021.9421225.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-1d41e8ab-01c3-45f8-b31a-638e549078f6
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.