Comparison and adaptation of automatic evaluation metrics for quality assessment of re-speaking

Wołk, K.; Korzinek, D.

doi:10.7494/csci.2017.18.2.129

Artykuł - szczegóły

Tytuł artykułu

Comparison and adaptation of automatic evaluation metrics for quality assessment of re-speaking

Autorzy

Wołk K. , Korzinek D.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2017.18.2.129

Warianty tytułu

Języki publikacji

Abstrakty

Re-speaking is a mechanism for obtaining high-quality subtitles for use in live broadcasts and other public events. Because it relies on humans to perform the actual re-speaking, the task of estimating the quality of the results is non- trivial. Most organizations rely on human effort to perform the actual quality assessment, but purely automatic methods have been developed for other similar problems (like Machine Translation). This paper will try to compare several of these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER, and RIBES. These will then be matched to the human-derived NER metric, commonly used in re-speaking. The purpose of this paper is to assess whether the above automatic metrics normally used for MT system evaluation can be used in lieu of the manual NER metric to evaluate re-speaking transcripts.

Słowa kluczowe

speech re-speaking machine translation evaluation

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2017

Tom

Vol. 18 (2)

Strony

129--144

Opis fizyczny

Bibliogr. 24 poz., rys., wykr., tab.

Twórcy

autor

Wołk K.

kwolkw@pja . edu . pl

Polish-Japanese Academy of Information Technology, Faculty of Information Technology, Department of Multimedia, Warsaw, Poland

autor

Korzinek D.

danijel@pja . edu . pl

Polish-Japanese Academy of Information Technology, Faculty of Information Technology, Department of Multimedia, Warsaw, Poland

Bibliografia

[1] Axelrod A.: Factored language model for statistical machine translation , Master of Science by Research, Institute for Communicating and Collaborative System, Division of Informatics, University of Edinburgh, 2006.
[2] Banerjee S., Lavie A.: METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization , vol. 29, pp. 65–72, 2005.
[3] Bestgen Y., Granger S.: Quantifying the development of phraseological competence in L2 English writing: An automated approach, Journal of Second Language Writing , vol. 26, pp. 28–41, 2014.
[4] Doddington G.: Automatic evaluation of machine translation quality using n -gram co-occurrence statistics. In: Proceedings of the second international conference on Human Language Technology Research , pp. 138–145. Morgan Kaufmann Publishers, 2002.
[5] Dutka L., Szarkowska A., Chmiel A., Lijewska A., Krejtz K., Marasek K., Brocki L.: Are interpreters better respeakers? An exploratory study on respeaking competences, Respeaking, live subtitling and accessibility , Rome, 12 June 2015.
[6] European Federation of Hard of Hearing People: State of subtitling access in EU. 2011 Report. http://ec . europa . eu/internal market/consultations/ 2011/audiovisual/non-registered-organisations/european-federation- of-hard-of-hearing-people-efhoh- en . pdf , 2011. [Online; accessed 30 Jan. 2016].
[7] Frost J.: Multiple Regression Analysis: Use Adjusted R-Squared and Predicted R-Squared to Include the Correct Number of Variables, The Minitab Blog , 2013. http://blog . minitab . com/blog/adventures-in-statistics-2/multiple- regession-analysis-use-adjusted-r-squared-and-predicted-r- squared-to-include-the-correct-number-of-variables .
[8] Han A.L.F., Wong D.F., Chao L.S.: LEPOR: A robust evaluation metric for machine translation with augmented factors. In: Proceedings of COLING 2012: Posters , pp. 441–450, 2012.
[9] Hovy E.: Toward finely differentiated evaluation metrics for machine translation. In: Proceedings of the EAGLES Workshop on Standards and Evaluation , Pisa, Italy, 1999.
[10] Isozaki H., Hirao T., Duh K., Sudoh K., Tsukada H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing , pp. 944–952. Association for Computational Linguistics, 2010.
[11] Kim J.O., Mueller C.W.: Standardized and unstandardized coefficients in causal analysis. An expository note, Sociological Methods & Research , vol. 4(4), pp. 423– 438, 1976.
[12] Koehn P., Hoang H., Birch A., Callison-Burch C., Federico M., Bertoldi N., Cowan B., Shen W., Moran C., Zens R., et al.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions , pp. 177–180, Association for Computational Linguistics, 2007.
[13] Lo C.-k., Wu D.: MEANT: an inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , vol. 1, pp. 220–229, Association for Computational Lin- guistics, 2011.
[14] Maziarz M., Piasecki M., Szpakowicz S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference , Matsue, Japan, pp. 50–62, 2012.
[15] Miller G.A.: WordNet: a lexical database for English, Communications of the ACM , vol. 38(11), pp. 39–41, 1995.
[16] Papineni K., Roukos S., Ward T., Zhu W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics , pp. 311–318, Association for Computational Linguistics, 2002.
[17] Porter M.F.: Snowball: A language for stemming algorithms, 2001. http://snowball . tartarus . org/texts/introduction . html .
[18] Reeder F.: Additional mt-eval references. In: International Standards for Language Engineering, Evaluation Working Group , 2001.
[19] Romero-Fresco P., Mart ́ınez J.: Accuracy rate in live subtitling. The NER model. In: Audiovisual Translation in a Global Context , pp. 28–50, 2011.
[20] Seber G.A., Lee A.J.: Linear regression analysis, John Wiley & Sons, 2012.
[21] Woliński M., Miłkowski M., Ogrodniczuk M., Przepiórkowski A., Szałkiewicz L.: PoliMorf: a (not so) new open morphological dictionary for Polish. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) , European Language Resources Association (ELRA), 23–25 May, Istanbul, Turkey, pp. 860–864.
[22] Wołk K., Marasek K.: Polish-English Speech Statistical Machine Translation Systems for the IWSLT 2013. In: Proceedings of the 10th International Workshop on Spoken Language Translation , Heidelberg, Germany, pp. 113–119, 2013.
[23] Wo lk K., Marasek K.: Enhanced Bilingual Evaluation Understudy, Lecture Notes on Information Theory , vol. 2(2), 2014.
[24] Zimmerman D.W.: Teachers corner: A note on interpretation of the paired- samples t test, Journal of Educational and Behavioral Statistics , vol. 22(3), pp. 349–360, 1997.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-99cbd1c2-0006-440c-b1b6-9437fcd02897