PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Classifying Industrial Sectors from German Textual Data with a Domain Adapted Transformer

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
For economics and sociological research, lists of industries and their branches are widely used in research to categorize data and get an overview on different types of industries. However, many different taxonomies and ordering schema exist, due to different research focus but also due to different national scenarios and interests. In this paper, we will focus without loss of generality on regional data from Germany. Manual annotation of textual data is time-consuming and tedious, naturally giving rise to our initial research question, also highly inspired by questions from computational social sciences: How can we automatically categorize textual data, e.g. job advertisements or business profiles, by industrial sectors? We will present an approach towards classification using a pre-trained domain-adapted Transformer model. We find that domain-adapted models generalize better and outperform state of the art non domain-adapted Transformer models on Out-Of-Distribution data. Additionally, we open source two novel data-sets mapping textual data to WZ2008 sections and divisions, enabling further research.
Rocznik
Tom
Strony
463--470
Opis fizyczny
Bibliogr. 25 poz., il., wykr., tab.
Twórcy
  • Federal Institute for Vocational Education and Training (BIBB), Bonn, Germany
  • University of T¨ubingen, Germany
  • Federal Institute for Vocational Education and Training (BIBB), Bonn, Germany
  • University Koblenz, Koblenz, Germany
autor
  • Federal Institute for Vocational Education and Training (BIBB), Bonn, Germany
Bibliografia
  • 1. R. Fechner, D. J. Dörpinghaus, and A. Firll, “FedCSIS 2023 Classifying Industrial Sectors with a Domain Adapted Transformer - Datasets and Configuration files,” Jul. 2023. [Online]. Available: https://doi.org/10.5281/zenodo.8192546
  • 2. M. Pejic-Bach, T. Bertoncel, M. Meško, and Ž. Krstić, “Text mining of industry 4.0 job advertisements,” International journal of information management, vol. 50, pp. 416–431, 2020.
  • 3. R. Chaisricharoen, W. Srimaharaj, S. Chaising, and K. Pamanee, “Classification approach for industry standards categorization,” in 2022 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON). IEEE, 2022, pp. 308–313.
  • 4. A. McCallum, K. Nigam et al., “A comparison of event models for naive bayes text classification,” in AAAI-98 workshop on learning for text categorization, vol. 752, no. 1. Madison, WI, 1998, pp. 41–48.
  • 5. A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial naive bayes for text categorization revisited,” in AI 2004: Advances in Artificial Intelligence: 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004. Proceedings 17. Springer, 2005, pp. 488–499.
  • 6. H. Hayashi and Q. Zhao, “Quick induction of nntrees for text categorization based on discriminative multiple centroid approach,” in 2010 IEEE International Conference on Systems, Man and Cybernetics. IEEE, 2010, pp. 705–712.
  • 7. K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information, vol. 10, no. 4, p. 150, 2019.
  • 8. C. Ospino, “Occupations: Labor market classifications, taxonomies, and ontologies in the 21st century,” Inter-American Development Bank, 2018.
  • 9. M. Rodrigues, Fernández-Macı́as, and Enrique, Sostero, Matteo, “A unified conceptual framework of tasks, skills and competences,” Seville, 2021. [Online]. Available: https://joint-research-centre.ec.europa.eu/publications/unified-conceptual-framework-tasks-skills-and-competences en
  • 10. A.-S. Gnehm, E. Bühlmann, and S. Clematide, “Evaluation of transfer learning and domain adaptation for analyzing german-speaking job advertisements,” in Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022, pp. 3892–3901.
  • 11. A.-S. Gnehm, E. Bühlmann, H. Buchs, and S. Clematide, “Fine-grained extraction and classification of skill requirements in german-speaking job ads.” Association for Computational Linguistics, 2022.
  • 12. J. Büchel, J. Engler, and A. Mertens, “The demand for data skills in german companies: Evidence from online job advertisements,” How to Reconstruct Ukraine? Challenges, Plans and the Role of the EU, p. 56, 2023.
  • 13. B. Gehrke, H. Legler, M. Leidmann, and K. Hippe, “Forschungs-und wissensintensive wirtschaftszweige: Produktion, wertschöpfung und beschäftigung in deutschland sowie qualifikationserfordernisse im europäischen vergleich,” Studien zum deutschen Innovationssystem, Tech. Rep., 2009.
  • 14. N. Gillmann and V. Hassler, “Coronabetroffenheit der wirtschaftszweige in gesamt-und ostdeutschland,” ifo Dresden berichtet, vol. 27, no. 04, pp. 03–05, 2020.
  • 15. U. Kies, D. Klein, and A. Schulte, “Cluster wald und holz Deutschland: Makroökonomische bedeutung, regionale zentren und strukturwan- del der beschäftigung in holzbasierten wirtschaftszweigen,” Cluster in Mitteldeutschland–Strukturen, Potenziale, Förderung, p. 103, 2012.
  • 16. V.-P. Niitamo, “Berufs-und qualifikationsanforderungen im ikt-bereich in europa erkennen und messen,” Schmidt, SL; Strietska-Ilina, O.; Dworschak, B, pp. 194–201, 2005.
  • 17. J. Hartmann and G. Schütz, “Die klassifizierung der berufe und der wirtschaftszweige im sozio-oekonomischen panel-neuvercodung der daten 1984-2001,” SOEP Survey Papers, Tech. Rep., 2017.
  • 18. M. Titze, M. Brachert, and A. Kubis, “The identification of regional industrial clusters using qualitative input–output analysis (qioa),” Regional Studies, vol. 45, no. 1, pp. 89–102, 2011.
  • 19. U. Kies, T. Mrosek, and A. Schulte, “Spatial analysis of regional industrial clusters in the german forest sector,” International Forestry Review, vol. 11, no. 1, pp. 38–51, 2009.
  • 20. Statistisches Bundesamt, “Klassifikation der Wirtschaftszweige,” Wiesbaden, 2008. [Online]. Available: https://www.destatis.de/static/DE/dokumente/klassifikation-wz-2008-3100100089004.pdf
  • 21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • 22. B. Chan, T. Möller, M. Pietsch, and T. Soni. (2019) bertbase-german-cased transformer model. [Online]. Available: https://huggingface.co/bert-base-german-cased
  • 23. A.-S. Gnehm, E. Bühlmann, and S. Clematide, “Evaluation of transfer learning and domain adaptation for analyzing german-speaking job advertisements,” in Proceedings of the 13th Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association, 2022.
  • 24. J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24 824–24 837, 2022.
  • 25. S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” arXiv preprint https://arxiv.org/abs/2305.10601, 2023.
Uwagi
1. Thematic Tracks Regular Papers
2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-7403c310-2087-4332-9b9b-9e4d097655d9
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.