PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Correcting Spelling Errors by Modelling Their Causes

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This paper accounts for a new technique of correcting isolated words in typed texts. A language-dependent set of string substitutions reflects the surface form of errors that result from vocabulary incompetence, misspellings, or mistypings. Candidate corrections are formed by applying the substitutions to text words absent from the computer lexicon. A minimal acyclic deterministic finite automaton storing the lexicon allows quick rejection of nonsense corrections, while costs associated with the substitutions serve to rank the remaining ones. A comparison of the correction lists generated by several spellcheckers for two corpora of English spelling errors shows that our technique suggests the right words more accurately than the others.
Rocznik
Strony
275--285
Opis fizyczny
Bibliogr. 44 poz., rys., tab., wykr.
Twórcy
autor
  • Silesian University of Technology, Institute of Computer Science, ul. Akademicka 16, 44–100 Gliwice, Poland
autor
  • Silesian University of Technology, Institute of Computer Science, ul. Akademicka 16, 44–100 Gliwice, Poland
Bibliografia
  • [1] Atkinson K. (2002): Spell checker test kernel results. — Available at http://aspell.net/test/.
  • [2] Atkinson K. (2003): GNU Aspell. — Available at http://aspell.sourceforge.net/.
  • [3] Baeza-Yates R. and Navarro G. (1998): Fast approximate string matching in a dictionary. — Proc. 5th Int. Symp. String Processing and Information Retrieval, SPIRE’98, Santa Crus de la Sierra, Bolivia, pp. 14–22.
  • [4] Berkel B. van and Smedt K.D. (1988): Triphone analysis: A combined method for the correction of orthographical and typographical errors. — Proc. 2nd Conf. Applied Natural Language Processing, Austin, pp. 77–83.
  • [5] Brill E. and Moore R.C. (2000): An improved error model for noisy channel spelling correction. — Proc. 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 286–293.
  • [6] Carrasco R. and Forcada M. (2002): Incremental construction and maintenance of minimal finite-state automata. — Comput. Linguistics, Vol. 28, No. 2, pp. 207–216.
  • [7] Church K.W. and Gale W.A. (1991): Probability scoring for spelling correction. — Statist. Comput., Vol. 1, No. 1, pp. 93–103.
  • [8] Ciura M.G. and Deorowicz S. (2001): How to squeeze a lexicon. — Soft. Pract. Exper., Vol. 31, No. 11, pp. 1077–1090.
  • [9] Czech Z.J., Havas G. and Majewski B.H. (1997): Perfect hashing. — Theoret. Comput. Sci., Vol. 182, No. 1–2, pp. 1–143.
  • [10] Daciuk J., Mihov S., Watson B.W. and Watson R.E. (2000): Incremental construction of minimal acyclic finite-state automata. — Comput. Linguistics, Vol. 26, No. 1, pp. 3–16.
  • [11] Damerau F.J. (1964): A technique for computer detection and correction of spelling errors. — Comm. ACM, Vol. 7, No. 3, pp. 171–176.
  • [12] Damerau F.J. (1990): Evaluating computer-generated domainvocabularies. — Inf. Process. Manag., Vol. 26, No. 6, pp. 791–801.
  • [13] Damerau F.J. and Mays E. (1989): An examination of undetected typing errors. — Inf. Process. Manag., Vol. 25, No. 6, pp. 659–664.
  • [14] Darragh J.J., Cleary J.G. and Witten I.H. (1993): Bonsai: A compact representation of trees. — Softw. Pract. Exper., Vol. 23, No. 3, pp. 277–291.
  • [15] Gadd T.N. (1990): PHONIX: The algorithm. — Program: Automat. Library Inf. Syst., Vol. 24, No. 4, pp. 363–366.
  • [16] Grudin J. (1983): Error patterns in skilled and novice transcription typing, In: Cognitive Aspects of Skilled Typewriting, (W.E. Copper, Ed.). — New York: Springer-Verlag, pp. 121–143.
  • [17] Hodge V.J. and Austin J. (2003): A comparison of standard spell checking algorithms and a novel binary neural approach. — IEEE Trans. Knowl. Data Eng., Vol. 15, No. 5, pp. 1073–1081.
  • [18] Jassem W. (1983): The Phonology of Modern English. —Warsaw: Polish Scientific Publishers.
  • [19] Knuth D.E. (1973): The Art of Computer Programming (Vol. 3. Sorting and Searching Algorithms). — Reading, MA: Addison–Wesley.
  • [20] Kuenning G.H. (2003): International ispell. — Available at http://www.cs.hmc.edu/~geoff/ispell.html.
  • [21] Kukich K. (1988): Variations on a back-propagation name recognition net. — Proc. Advanced Technology Conference, U.S. Postal Service, Wash. D.C., USA, pp. 722–735.
  • [22] Kukich K. (1992): Techniques for automatically correcting words in text. — ACM Comput. Surveys, Vol. 24, No. 4, pp. 377–439.
  • [23] Levenshtein V.I. (1966): Binary codes capable of correcting deletions, insertions and reversals.—Soviet Physics Doklady, Vol. 10, pp. 707–710.
  • [24] Maly K. (1976): Compressed tries. — Comm. ACM, Vol. 19, No. 7, pp. 409–415.
  • [25] Mateescu D. (2003): English phonetics and phonological theory. — University of Bucharest, Romania. Available at http://www.unibuc.ro/eBooks/filologie/mateescu.
  • [26] Merriam-Webster (2002): A dictionary of prefixes, suffixes, and combining forms from Webster’s third new international dictionary, unabridged. — Merriam–Webster. Available at http://www.spellingbee.com/pre_suf_comb.pdf.
  • [27] Mihov S. and Schulz K.U. (2004): Fast approximate search in large dictionaries. —Comput. Linguistics, Vol. 30, No. 4, pp. 451–477.
  • [28] Mitton R. (1987): Spelling checkers, spelling correctors, and the misspellings of poor spellers. — Inf. Process. Manag., Vol. 23, No. 5, pp. 495–505.
  • [29] Mitton R. (1996): Spellchecking by computer. — J. Simplif. Spell. Soc., Vol. 20. No. 1, pp. 4–11.
  • [30] Morrison D.R. (1968): PATRICIA—Practical Algorithm to Retrieve Information Coded in Alphanumeric. — J. ACM, Vol. 15, No. 4, pp. 514–534.
  • [31] Odell M.K. and Russell R.C. (1918): U.S. Patent Numbers, 1,261,167 (1918) and 1,435,663 (1922).—U.S. Patent Office, Washington, D.C.
  • [32] Oflazer K. (1996): Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction.— Comput. Linguistics, Vol. 22, No. 1, pp. 73–89.
  • [33] Peterson J.L. (1986): A note on undetected typing errors. — Comm. ACM, Vol. 29, No. 7, pp. 633–637.
  • [34] Philips L. (1990): Hanging on the metaphone.—Comput. Lang. Mag., Vol. 7, No. 12, pp. 38–44.
  • [35] Philips L. (2000): The double metaphone search algorithm. — C/C++ Users Journal, Vol. 18, No. 6.
  • [36] Pollock J.J. and Zamora A. (1983): Collection and characterization of spelling errors in scientific and scholary text. — J. Amer. Soc. Inf. Sci., Vol. 34, No. 1, pp. 51–58.
  • [37] Pollock J.J. and Zamora A. (1984): Automatic spelling correction in scientific and scholarly text. — Comm. ACM, Vol. 27, No. 4, pp. 358–368.
  • [38] Savary A. (2001): Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction.— Lect. Notes Comput. Sci., Vol. 2494, pp. 251–260.
  • [39] Schulz K.U. and Mihov S. (2002): Fast string correction with Levenshtein-automata. — Int. J. Docum. Anal. Recognit., Vol. 5, No. 1, pp. 67–85.
  • [40] Toutanova K. and Moore R.C. (2002): Pronunciation modelling for improved spelling correction. — Proc. 40th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 144–151.
  • [41] Wagner R.A. (1974): Order-n correction for regular languages. — Comm. ACM, Vol. 17, No. 5, pp. 265–268.
  • [42] Wikipedia (2003): Wikipedia: List of common misspellings. — Available at http://en2.wikipedia.org/wiki/Wikipedia:List_of_common_misspellings.
  • [43] Yannakoudakis E.J. and Fawthrop D. (1983a): An intelligent spelling corrector. — Inf. Process. Manag., Vol. 19, No. 12, pp. 101–108.
  • [44] Yannakoudakis E.J. and Fawthrop D. (1983b): The rules of spelling errors. — Inf. Process. Manag., Vol. 19, No. 2, pp. 87–99.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BPZ1-0016-0016
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.