Improving Logical Structure Analysis of Visually Structured Documents with Textual Features

Le, Huu-Loi; Trong, Nghia Luu; Thanh, Huyen Ngo

doi:10.15439/2022R26

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Improving Logical Structure Analysis of Visually Structured Documents with Textual Features

Autorzy

Le Huu-Loi , Trong Nghia Luu , Thanh Huyen Ngo

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.15439/2022R26

Warianty tytułu

Języki publikacji

Abstrakty

This paper introduces a new model to improve the quality of logical structure analysis of visually structured documents. To do that, we extend the model of Koreeda and Manning [1]. In order to enhance textual features, we define a new feature that uses the font size of texts as an indicator. As our observation, the font size is an important indicator that can be used to represent the structure of a document. The new font size feature is combined with visual, textual, and semantic features for training an analyzer. Experimental results on four legal datasets show that the new font size feature contributes to the model and helps to improve the F-scores. The ablation study also shows the contribution of each feature in our model.

Słowa kluczowe

logical structure analysis VSDs feature engineering information extraction visually structured documents

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2022

Tom

Vol. 33

Strony

151--156

Opis fizyczny

Bibliogr. 14 poz., wykr.

Twórcy

autor

Le Huu-Loi

lehuuloi.cs@gmail.com

Hung Yen University of Technology and Education, Hung Yen, Vietnam

autor

Trong Nghia Luu

nghia.lt204888@sis.hust.edu.vn

Hanoi University of Science and Technology, Hanoi, Vietnam

autor

Thanh Huyen Ngo

nthuyen@utehy.edu.vn

Hung Yen University of Technology and Education, Hung Yen, Vietnam

Bibliografia

[1] Y. Koreeda and C. Manning, “Capturing logical structure of visually structured documents with multimodal transition parser,” in Proceedings of the Natural Legal Language Processing Workshop 2021. Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 144–154. [Online]. Available: https://aclanthology.org/2021.nllp-1.15
[2] V. W. Frederik Obermaier, Bastian Obermayer and W. Jaschensky, “About the panama papers,” in Süddeutsche Zeitung, 2016.
[3] M.-T. Nguyen, D. T. Le, and L. Le, “Transformers-based information extraction with limited data for domain-specific business documents,” Engineering Applications of Artificial Intelligence, vol. 97, p. 104100, 2021.
[4] Y. Hatsutori, K. Yoshikawa, and H. Imai, “Estimating legal document structure by considering style information and table of contents,” in New Frontiers in Artificial Intelligence, S. Kurahashi, Y. Ohta, S. Arai, K. Satoh, and D. Bekki, Eds. Cham: Springer International Publishing, 2017, pp. 270–283.
[5] C. G. Stahl, S. R. Young, D. Herrmannova, R. M. Patton, and J. C. Wells, “Deeppdf: A deep learning approach to extracting text from pdfs,” Oak Ridge National Lab.(ORNL), Oak Ridge, TN (United States), Tech. Rep., 2018.
[6] C. Soto and S. Yoo, “Visual detection with context for document layout analysis,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3464–3470.
[7] Y. Xu, M. Li, L. Cui, S. Huang, F. Wei, and M. Zhou, “Layoutlm: Pre-training of text and layout for document image understanding,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1192–1200.
[8] Y. Xu, Y. Xu, T. Lv, L. Cui, F. Wei, G. Wang, Y. Lu, D. A. F. Florêncio, C. Zhang, W. Che, M. Zhang, and L. Zhou, “Layoutlmv2: Multi-modal pre-training for visually-rich document understanding,” CoRR, vol. abs/2012.14740, 2020. [Online]. Available: https://arxiv.org/abs/2012.14740
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[10] C. Sporleder and M. Lapata, “Automatic paragraph identification: A study across languages and domains,” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004, pp. 72–79.
[11] C. Abreu, H. Cardoso, and E. Oliveira, “FinDSE@FinTOC-2019 shared task,” in Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019). Turku, Finland: Linköping University Electronic Press, Sep. 2019, pp. 69–73. [Online]. Available: https://aclanthology.org/W19-6410
[12] D. Ferrés, H. Saggion, F. Ronzano, and À. Bravo, “Pdfdigest: an adaptable layout-aware pdf-to-xml textual content extractor for scientific articles,” in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
[13] M. Ostendorf, M. Collins, S. Narayanan, D. W. Oard, and L. Vanderwende, “Proceedings of human language technologies: The 2009 annual conference of the north american chapter of the association for computational linguistics,” in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2009.
[14] S. Zhang, X. Ma, K. Duh, and B. V. Durme, “AMR parsing as sequence-to-graph transduction,” CoRR, vol. abs/1905.08704, 2019. [Online]. Available: http://arxiv.org/abs/1905.08704

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-58ca1fb9-fc04-440c-90b5-31f95bb10239