Automated Creation of Parallel Bible Corpora with Cross-Lingual Semantic Concordance

Dörpinghaus, Jens; Düing, Carsten

doi:10.15439/2021F30

Artykuł - szczegóły

Tytuł artykułu

Automated Creation of Parallel Bible Corpora with Cross-Lingual Semantic Concordance

Autorzy

Dörpinghaus Jens , Düing Carsten

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2021F30

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (16 ; 02-05.09.2021 ; online)

Języki publikacji

Abstrakty

Here we present a novel approach for automated creation of parallel New Testament corpora with cross-lingual semantic concordance based on Strong's numbers. There is a lack of available digital Biblical resources for scholars. We present two approaches to tackle the problem, a dictionary-based approach and a CRF model and a detailed evaluation on annotated and non-annotated translations. We discuss a proof-of-concept based on English and German New Testament translations. The results presented in this paper are novel and according to our knowledge unique. They present promising performance, although further research is necessary.

Słowa kluczowe

natural languages religion text analysis

języki naturalne religia analiza tekstu

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2021

Tom

Vol. 25

Strony

111--114

Opis fizyczny

Bibliogr. 23 poz., rys., tab.

Twórcy

autor

Dörpinghaus Jens

u21829927@tuks.co.za

University of Pretoria, Faculty of Theology and Religion, Hatfield, Pretoria, South Africa

autor

Düing Carsten

Faculty for Mathematics and Informatics, Fernuniversität Hagen, Germany

Bibliografia

1. S. Landes, C. Leacock, and R. I. Tengi, “Building semantic concordances,” WordNet: An electronic lexical database, vol. 199, no. 216, pp. 199–216, 1998.
2. B. Metzger, The Bible in Translation: Ancient and English Versions, ser. Biblical studies. Baker Publishing Group, 2001.
3. C. Clivaz, “Die bibel im digitalen zeitalter: Multimodale schriften in gemeinschaften,” Zeitschrift für Neues Testament, vol. 20, no. 39/40, pp. 35–57, 2017.
4. C. Anderson, “Digital humanities and the future of theology,” 2018.
5. C. Clivaz, A. Gregory, and D. Hamidović, Digital Humanities in Biblical, Early Jewish and Early Christian Studies. Brill, 2013.
6. M. Cysouw, C. Biemann, and M. Ongyerth, “Using strong’s numbers in the bible to test an automatic alignment of parallel texts,” STUF-language typology and universals, vol. 60, no. 2, pp. 158–171, 2007.
7. B. Wälchli, “Similarity semantics and building probabilistic semantic maps from parallel texts,” Linguistic Discovery, vol. 8, no. 1, pp. 331–371, 2010.
8. M. Simard, “Building and using parallel text for translation,” The Routledge Handbook of Translation and Technology, pp. 78–90, 2020.
9. A. Yli-Jyrä, J. Purhonen, M. Liljeqvist, A. Antturi, P. Nieminen, K. M. Räntilä, and V. Luoto, “Helfi: a hebrew-greek-finnish parallel bible corpus with cross-lingual morpheme alignment,” arXiv preprint https://arxiv.org/abs/2003.07456, 2020.
10. N. Rees and J. Riding, “Automatic concordance creation for texts in any language,” Proceedings of Translation and the Computer, vol. 31, 2009.
11. M. Diab and S. Finch, “A statistical word-level translation model for comparable corpora,” MARYLAND UNIV COLLEGE PARK INST FOR ADVANCED COMPUTER STUDIES, Tech. Rep., 2000.
12. P. Resnik, M. B. Olsen, and M. Diab, “The bible as a parallel corpus: Annotating the ‘book of 2000 tongues’,” Computers and the Humanities, vol. 33, no. 1, pp. 129–153, 1999.
13. C. Christodouloupoulos and M. Steedman, “A massively parallel corpus: the bible in 100 languages,” Language resources and evaluation, vol. 49, no. 2, pp. 375–395, 2015.
14. J. D. Riding, “Statistical glossing, language independent analysis in bible translation,” Translating and the Computer, vol. 30, 2008.
15. J. Renkema and C. van Wijk, “Converting the words of god: An experimental evaluation of stylistic choices in the new dutch bible translation,” Linguistica Antverpiensia, New Series–Themes in Translation Studies, no. 1, 2002.
16. L. De Vries, “Bible translation and primary orality,” The Bible Translator, vol. 51, no. 1, pp. 101–114, 2000.
17. G. G. Scorgie, M. L. Strauss, S. M. Voth et al., The challenge of Bible translation: Communicating God’s Word to the world. Zondervan Academic, 2009.
18. A. McMillan-Major, “Automating gloss generation in interlinear glossed text,” Proceedings of the Society for Computation in Linguistics, vol. 3, no. 1, pp. 338–349, 2020.
19. X. Zhao, S. Ozaki, A. Anastasopoulos, G. Neubig, and L. Levin, “Automatic interlinear glossing for under-resourced languages leveraging translations,” in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 5397–5408.
20. A. B. Muhammad, Annotation of conceptual co-reference and text mining the Qur’an. University of Leeds, 2012.
21. E. Biagetti, C. Zanchi, and W. M. Short, “Toward the creation of wordnets for ancient indo-european languages,” in Proceedings of the 11th Global Wordnet Conference, 2021, pp. 258–266.
22. V. Perrone, M. Palma, S. Hengchen, A. Vatri, J. Q. Smith, and B. McGillivray, “GASC: Genre-aware semantic change for Ancient Greek,” in Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Florence, Italy: Association for Computational Linguistics, Aug. 2019, pp. 56–66. [Online]. Available: https://www.aclweb.org/anthology/W19-4707
23. J. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” Proceedings of the Eighteenth International Conferenceon Machine Learning, 2001.

Uwagi

1. Track 1: Artificial Intelligence in Applications

2. Session: 15th International Symposium Advances in Artificial Intelligence and Applications

3. Short Paper

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-b10b3263-9df5-4de5-b07b-77d57f31e518