PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Exploiting bert for malformed segmentation detection to improve scientific writings

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique.
Słowa kluczowe
Rocznik
Strony
126--141
Opis fizyczny
Bibliogr. 27 poz., fig., tab.
Twórcy
  • Al-Azhar University, Faculty of Engineering, Systems and Computer, Egypt
  • Al-Azhar University, Faculty of Engineering, Systems and Computer, Egypt
  • Al-Azhar University, Faculty of Engineering, Systems and Computer, Egypt
Bibliografia
  • [1] Almuhareb, A. a.-T. (2019). Arabic word segmentation with long short-term memory neural networks and word embedding. IEEE Access, 7, 12879-12887. https://doi.org/10.1109/ACCESS.2019.2893460
  • [2] Barrow, J., Jain, R., Morariu, V., & Manjunatha, V. (2020). A joint model for document segmentation and segment labeling. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (pp. 313-322). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.29
  • [3] Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., & Specia, L. (2017). Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. arXiv. https://doi.org/10.48550/arXiv.1708.00055
  • [4] Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R. S., Constant, N., Guajardo- Cespedes, M., Yuan, S., Tar, Ch., Sung, Y.-H. Strope, B., & Kurzweil, R. (2018). Universal sentence encoder. arXiv. https://doi.org/10.48550/arXiv.1803.11175
  • [5] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://doi.org/10.48550/arXiv.1810.04805
  • [6] Galanopoulos, D., & Mezaris, V.(2019). Temporal lecture video fragmentation using word embeddings. In Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., & Vrochidis, S. (Eds.) MultiMedia Modeling: 25th International Conference, MMM 2019, Thessaloniki, Greece, January 8--11, 2019, Proceedings, Part II (vol. 25, pp. 254--265). Springer. https://doi.org/10.1007/978-3-030-05716-9_21
  • [7] Hearst, M. A. (1997). Text tiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1), 33-64.
  • [8] Hinkel, E. (2001). Matters of cohesion in L2 academic texts. Applied language learning, 12(2), 111-132.
  • [9] ielts-mentor. (2022). Retrieved from https://www.ielts-mentor.com/reading-sample/gt-reading/3162-employment-in-japan ?
  • [10] Levy, C. M., & Ransdell. S. (1996). The science of writing: Theories, methods, individual differences and applications. Routledge. https://doi.org/10.4324/9780203811122
  • [11] Lin, M., Nunamaker, J.F., Chau, M., & Chen, H. (2004). Segmentation of lecture videos based on text: a method combining multiple linguistic features. 37th Annual Hawaii International Conference on System Sciences. (pp. 9-9). IEEE. https://doi.org/10.1109/HICSS.2004.1265045
  • [12] Lin, M., Chau, M., Cao, J., & Nunamaker, J. F. (2005). Automated video segmentation for lecture videos: A linguistics-based approach. International Journal of Technology and Human Interaction (IJTHI), 1(2), 27-45. https://doi.org/10.4018/jthi.2005040102
  • [13] Lo, K., Jin, Y., Tan, W., Liu, M., Du, L., & Buntine, W. (2021). Transformer over Pre-trained Transformer for Neural Text Segmentation with Enhanced Topic Coherence. arXiv. https://doi.org/10.48550/arXiv.2110.07160
  • [14] Luckert, M., & Schaefer- Kehnert, M. (2016). Using machine learning methods for evaluating the quality of technical documents.
  • [15] Maraj, A., Martin, M. V., & Makrehchi, M. (2021). A More Effective Sentence-Wise Text Segmentation Approach Using BERT. In Llads, J., Lopresti, D., & Uchida, S (Eds.), Document Analysis and Recognition--ICDAR 2021, (pp. 236-250). Springer. https://doi.org/10.1007/978-3-030-86337-1_16
  • [16] Ponceleon, D., & Srinivasan, S. (2001). Automatic discovery of salient segments in imperfect speech transcripts. Proceedings of the tenth international conference on Information and knowledge management, 490-497. The ACM Digital Library. https://doi.org/10.1145/502585.502668
  • [17] Precision_and_recall. (2022). Retrieved from wikipedia: https://en.wikipedia.org/wiki/Precision_and_recall?oldformat=true
  • [18] Reimers, N., & Gurevyvh, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv. https://doi.org/10.48550/arXiv.1908.10084
  • [19] Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. IEEE conference on computer vision and pattern recognition (CVPR) (pp.815-823). IEEE. https://doi.org/10.1109/CVPR.2015.7298682
  • [20] Shah, R. R., Yu, Y., Skaikh, A. D., & Zimmermann, R. (2015). TRACE: linguistic-based approach for automatic lecture video segmentation leveraging Wikipedia texts. 2015 IEEE International Symposium on Multimedia (ISM) (pp. 217-220). IEEE. https://doi.org/10.1109/ISM.2015.18
  • [21] Soares, E. R., & Barrére, E. (2019). An optimization model for temporal video lecture segmentation using word2vec and acoustic features. Proceedings of the 25th Brazillian Symposium on Multimedia and the Web, 513-520. The ACM Digital Library. https://doi.org/10.1145/3323503.3349548
  • [22] Solbiati, A., Heffernan, K., Damaskinos, G., Poddar, S., Modi, S., & Cali, J. (2021). Unsupervised topic segmentation of meetings with BERT embeddings. arXiv. https://doi.org/10.48550/arXiv.2106.12978
  • [23] Glavas, G., & Somasundaran, S. (2020). Two-level transformer and auxiliary coherence modeling for improved text segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7797-7804. https://doi.org/10.1609/aaai.v34i05.6284
  • [24] Text_segmentation. (2011). Retrieved from wikipedia: https://en.wikipedia.org/wiki/Text_segmentation
  • [25] Ugur Akinci, G. K. (2012). Writing Transition Phrases and Sentences: 12 Types of Sentence and Paragraph Transitions with 112 Examples.
  • [26] University, UAH. (n.d.). WRITING EFFECTIVE TRANSITIONS. Retrieved from https://www.uah.edu/images/administrative/student-success-center/resources/handouts/handouts_2019/writing_effective_transitions.pdf
  • [27] Wang, Y., Li, S., & Yang, J. (2018). Toward fast and accurate neural discourse segmentation. arXiv. https://doi.org/10.48550/arXiv.1808.09147
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b1428454-1e68-4a22-86c0-d208fb66a632
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.