Effective similarity measures in electronic testing at programming languages

Akinwale, A.; Niewiadomski, A.

Artykuł - szczegóły

Tytuł artykułu

Effective similarity measures in electronic testing at programming languages

Autorzy

Akinwale A. , Niewiadomski A.

Wybrane pełne teksty z tego czasopisma

https://eczasopisma.p.lodz.pl/JACS/issue/archive

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The purpose of this study is to explore the grammatical proper ties and features of generalized n-gram matching technique in electronic test at programming languages. N-gram matching technique has been success fully employed in information handling and decision support system dealing with texts but its side effect is size n which tends to be rather large. Two new methods of odd gram and sumsquare gram have been proposed for the improvement of generalized n-gram matching together with the modification of existing methods. While generalized n-grams matching is easy to generate and manage, they do require quadratic time and space complexity and are therefore ill-suited to the proposed and modified methods which work in quadratic in nature. Experiments have been conducted with the two new methods and modified ones using real life programming code assignments as pattern and text matches and the derived results were compared with the existing methods which are among the best in practice. The results obtained experimentally are very positive and suggested that the proposed methods can be successfully applied in electronic test at programming languages.

Słowa kluczowe

similarity and distance measures fuzzy relations n-gram programming codes

miara podobieństwa miara odległości relacje rozmyte n-gram kody programowania

Wydawca

Wydawnictwo Politechniki Łódzkiej

Czasopismo

Journal of Applied Computer Science

Rocznik

2012

Tom

Vol. 20, nr 2

Strony

7--26

Opis fizyczny

Bibliogr. 24 poz.

Twórcy

autor

Akinwale A.

autor

Niewiadomski A.

Lodz University of Technology, Institute of Information Technology, ul. Wólczańska 215, 90-924 Łódź, Poland, adio.taofiki.akinwale@guest.p.lodz.pl

Bibliografia

[1] Spinels, D., Zaharias, P., and Vrechopoulos, A., Coping with Plagiarism and Grading Load: Randomized Programming Assignments and Reflective Grading, Computer applications in engineering education, Vol. 5, No. 2, 2007, pp. 113–123.
[2] Marko, A. A., Essai d’une recherche statistique sur le text du roman, Engene oneguine, bull. Acad imper sci. st Petersburg, Vol. 7, No. 3, 1913, pp. 153–162.
[3] Shannon, C. E., Prediction and entropy of printed English, The Bell System Technical Journal, Vol. 30, 1951, pp. 50–64.
[4] Zamora, E. M., Pollock, J. J., and Zamora, A., The use of trigram for spelling error detection, Information Processing and Management, Vol. 17, 1981, pp. 305–316.
[5] Burnett, J., Cooper, D., Lynch, M., Willett, P., and Wycherley, M., Document retrieval experiments using indexing vocabularies of varying size, Journal of Documentation, Vol. 35, No. 3, 1979, pp. 197–206.
[6] Trenkle, J. and Cavnar, W. B., N-gram based text categorization, In: Proceedings of SDAIR-94, the 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994, pp. 161–175, University of Nevada, Las Vegas.
[7] Zhao, J., Network and n-gram decoding in speech recognition, Master’s thesis, Department of Electrical and Computer Science, Mississippi State University, 2000.
[8] Cheng, B. Y., Carbonell, J. G., and Klein-Seetharaman, J., Protein classification based on text document classification techniques, Journal of protein, Vol. 58, No. 4, 2005, pp. 955–970.
[9] Nakamura, M. and Shikano, M., A study of English word category prediction based on neural networks, International conference on acoustics, speech and signal processing, Vol. 2, 1989, pp. 731–734.
[10] Tan, C. L., Sung, S. Y., Yu, Z., and Xu, Y., Text Retrieval from Document Images based on N-Gram Algorithm, In: Text and Web Mining Workshop, 6th Pacific Rim International Conference on Artificial Intelligence, Publisher, 2000, pp. 257–270.
[11] Harrison, M., Implementation of the substring test by hashing, Communication of the ACM, Vol. 14, No. 12, 1971, pp. 777–779.
[12] Damashek, M., Gauging similarity with n-grams: Language-independent categorization of text, Science, Vol. 267, No. 5199, 1995, pp. 843–849.
[13] Pearce, C. and Nicholas, C., Experiments in a dynamic hypertext environment for degraded and multilingual data, Journal of the American society for information science, Vol. 47, No. 4, 1996, pp. 263–275.
[14] Cohen, J. A., Highlight: Language and domain independence automatic indexing terms for abstracting, Journal of the American society for information science, Vol. 46, No. 3, 1995, pp. 162–174.
[15] Yannakoudakis, E., Goyal, P., and Huggil, J., The generation and use of text fragments for data compression, Information processing and management, Vol. 18, No. 1, 1982, pp. 15–21.
[16] Church, K. W. and Gale, W. A., A comparison of the enhanced good-turing and deleted estimation methods for estimating probabilities of English bigrams, Computer speech language, Vol. 5, No. 1, 1991, pp. 19–54.
[17] Kuhn, R. and De Mori, R., A cache-based natural language model for speech recognition, IEEE transactions on pattern analysis and machine intelligence, Vol. 12, No. 6, 1990, pp. 570–583.
[18] Niesler, T. R. and Woodland, P. C., A variable-length category-based n-gram language model, In: IN PROCEEDINGS, IEEE ICASSP, 1996, pp. 164–167.
[19] Abou-Assaleh, T., Cercone, N., Keselj, V., and Sweidan, R., N-gram based detection of new malicious code, In: COMPSAC ’04 Proceedings of the 28th Annual International Computer Software and Applications Conference Workshops and Fast Abstracts Volume 02, 2004, pp. 41–42.
[20] Niewiadomski, A., Methods for the linguistic summarization of data: Application of fuzzy sets and their extensions, EXIT Publishing House, Warsaw, 2008.
[21] Pascual, J.-I., A procedure for the construction of a similarity relation, In: Proceedings of IPMU’08, Malaga, edited by L. Magdalena, M. Ojeda Aciego, and J. Verdegay, 2008, pp. 489–496.
[22] Wang, J., Li, G., and Fe, J., Fast-Join: An efficient method for fuzzy token matching based string similarity join, In: IEEE 27th International Conference on Data Engineering (ICDE), 2011, pp. 458–469.
[23] Arsmah, I. and Zainab, A. B., Automated grading of linear algebraic equation using n-gram method, Tech. rep., Institute of Research, Development and Commercialization, Universiti Teknologi MARA, 2005.
[24] Buckles, B. P. and Petry, F., Information theoretic characterization of fuzzy relational databases, IEEE transaction systems man cybernet, Vol. 13, No. 1, 1983, pp. 74–77.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-LODD-0002-0001