PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

TF-IDF inspired detection for cross-language source code plagiarism and collusion

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Furthermore, it is comparable to those techniques when dealing with conventional disguises.
Wydawca
Czasopismo
Rocznik
Tom
Strony
113--134
Opis fizyczny
Bibliogr. 56 poz., rys.
Twórcy
  • Maranatha Christian University, Surya Sumantri Street No. 65, Bandung, 40164, Indonesia
Bibliografia
  • [1] Acampora G., Cosma G.: A fuzzy-based approach to programming language independent source-code plagiarism detection. In: The 2015 IEEE International Conference on Fuzzy Systems, pp. 1–8. IEEE, 2015. https://doi.org/10.1109/ FUZZ-IEEE.2015.7337935.
  • [2] Agrawal M., Sharma D.K.: A state of art on source code plagiarism detection. In: The 2nd International Conference on Next Generation Computing Technologies, pp. 236–241. IEEE, Dehradun, 2016. https://doi.org/10.1109/NGCT.2016. 7877421.
  • [3] Al-Khanjari Z.A., Fiaidhi J.A., Al-Hinai R.A., Kutti N.S.: PlagDetect: a Java programming plagiarism detection tool, ACM Inroads, vol. 1(4), pp. 66–71, 2010. https://doi.org/10.1145/1869746.1869766.
  • [4] Allyson F.B., Danilo M.L., Jos´e S.M., Giovanni B.C.: Sherlock N-Overlap: invasive normalization and overlap coefficient for the similarity analysis between source code, IEEE Transactions on Computers, vol. 68, 2018. https://doi.org/10. 1109/TC.2018.2881449.
  • [5] Arwin C., Tahaghoghi S.M.M.: Plagiarism detection across programming languages. In: The 29th Australasian Computer Science Conference – Volume 48, pp. 277–286, Australian Computer Society, Hobart, 2006. https://dl.acm.org/ citation.cfm?id=1151730.
  • [6] Bohning D.: Multinomial logistic regression algorithm, Annals of the Institute of Statistical Mathematics, vol. 44(1), pp. 197–200, 1992. https://doi.org/10.1007/ BF00048682.
  • [7] Brixtel R., Fontaine M., Lesner B., Bazin C., Robbes R.: Language-independent clone detection applied to plagiarism detection. In: The 10th IEEE Working Conference on Source Code Analysis and Manipulation, pp. 77–86. IEEE, Timisoara, 2010. https://doi.org/10.1109/SCAM.2010.19.
  • [8] Budiman A.E., Karnalim O.: Automated Hints Generation for Investigating Source Code Plagiarism and Identifying The Culprits on In-Class Individual Programming Assessment, Computers, vol. 8(1), pp. 1–20, 2019. https://doi.org/10. 3390/computers8010011.
  • [9] Burrows S., Tahaghoghi S.M.M., Zobel J.: Efficient plagiarism detection for large code repositories, Software: Practice and Experience, vol. 37(2), pp. 151–175, 2007. https://doi.org/10.1002/spe.750.
  • [10] Cortes C., Vapnik V.: Support-vector networks, Machine Learning, vol. 20(3), pp. 273–297, 1995. https://doi.org/10.1007/BF00994018.
  • [11] Cosma G., Joy M.: Towards a Definition of source-code plagiarism, IEEE Transactions on Education, vol. 51(2), pp. 195–200, 2008. https://doi.org/10.1109/TE. 2007.906776.
  • [12] Cosma G., Joy M.: An approach to source-code plagiarism detection and investigation using Latent Semantic Analysis, IEEE Transactions on Computers, vol. 61(3), pp. 379–394, 2012. https://doi.org/10.1109/TC.2011.223.
  • [13] Croft W.B., Metzler D., Strohman T.: Search engines: information retrieval in practice, Addison-Wesley, 2010.
  • [14] Domin C., Pohl H., Krause M.: Improving plagiarism detection in coding assignments by dynamic removal of common ground. In: The 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1173–1179. ACM Press, San Jose, 2016. https://doi.org/10.1145/2851581.2892512.
  • [15] Engels S., Lakshmanan V., Craig M.: Plagiarism Detection Using Feature-Based Neural Networks. In: The 38th SIGCSE Technical Symposium on Computer Science Education, vol. 39, pp. 34–38, ACM Press, 2007. https://doi.org/10.1145/ 1227504.1227324.
  • [16] Faidhi J.A.W., Robinson S.K.: An empirical approach for detecting program similarity and plagiarism within a university programming environment, Computers & Education, vol. 11(1), pp. 11–19, 1987. https://doi.org/10.1016/0360-1315(87) 90042-X.
  • [17] Flores E., Barr´on-Cede˜no A., Moreno L., Rosso P.: Cross-language source code re-use detection using Latent Semantic Analysis, Journal of Universal Computer Science, vol. 21(13), pp. 1708–1725, 2015. http://www.jucs.org/jucs 21 13/cross language source code.
  • [18] Flores E., Barron-Cedeno A., Moreno L., Rosso P.: Uncovering source code reuse in large-scale academic environments, Computer Applications in Engineering Education, vol. 23(3), pp. 383–390, 2015. https://doi.org/10.1002/cae.21608.
  • [19] Fraser R.: Collaboration, collusion and plagiarism in computer science coursework, Informatics in Education, vol. 13(2), pp. 179–195, 2014. https://doi.org/ 10.15388/infedu.2014.01.
  • [20] Fu D., Xu Y., Yu H., Yang B.: WASTK: a weighted abstract syntax tree kernel method for source code plagiarism detection, Scientific Programming, vol. 2017, pp. 1–8, 2017. https://doi.org/10.1155/2017/7809047.
  • [21] Halak B., El-Hajjar M.: Plagiarism detection and prevention techniques in engineering education. In: The 11th European Workshop on Microelectronics Education, pp. 1–3. IEEE, Southampton, 2016. https://doi.org/10.1109/EWME.2016. 7496465.
  • [22] Halstead M.H.: An experimental determination of the “purity” of a trivial algorithm, ACM SIGMETRICS Performance Evaluation Review, vol. 2(1), pp. 10–15, 1973. https://doi.org/10.1145/1041606.1041608.
  • [23] Inoue U., Wada S.: Detecting plagiarisms in elementary programming courses. In: The 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 2308–2312. IEEE, 2012. https://doi.org/10.1109/FSKD.2012.6234186.
  • [24] Jadalla A., Elnagar A.: PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach, International Journal of Business Intelligence and Data Mining, vol. 3(2), p. 121, 2008. https://doi.org/10.1504/IJBIDM.2008. 020514.
  • [25] Karnalim O.: A Low-Level Structure-Based Approach for Detecting Source Code Plagiarism, IAENG International Journal of Computer Science, vol. 44(4), pp. 501–522, 2017. http://www.iaeng.org/IJCS/issues v44/issue 4/IJCS 44 4 11.pdf.
  • [26] Karnalim O.: Source code plagiarism detection with low-level structural representation and information retrieval, International Journal of Computers and Applications, 2019. https://doi.org/10.1080/1206212X.2019.1589944.
  • [27] Karnalim O., Budi S.: The effectiveness of low-level structure-based approach toward source code plagiarism level taxonomy. In: The 6th International Conference on Information and Communication Technology, pp. 130–134. IEEE, Bandung, 2018. https://doi.org/10.1109/ICoICT.2018.8528768.
  • [28] Karnalim O., Budi S., Toba H., Joy M.: Source code plagiarism detection in academia with information retrieval: dataset and the observation, Informatics in Education, vol. 18(2), pp. 321–344, 2019. https://doi.org/10.15388/infedu.2019. 15.
  • [29] Kermek D., Novak M.: Process model improvement for source code plagiarism detection in student programming assignments, Informatics in Education, vol. 15(1), pp. 103–126, 2016. https://doi.org/10.15388/infedu.2016.06.
  • [30] Kikuchi H., Goto T., Wakatsuki M., Nishino T.: A source code plagiarism detecting method using alignment with abstract syntax tree elements. In: The 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 1–6. IEEE, Las Vegas, 2014. https://doi.org/10.1109/SNPD.2014.6888733.
  • [31] Liang Y.D.: Introduction to Java programming, comprehensive version (9th Edition), Pearson, 2013.
  • [32] Liu C., Chen C., Han J., Yu P.S.: GPLAG: detection of software plagiarism by program dependence graph analysis. In: The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 872–881, ACM Press, Philadelphia, 2006. https://doi.org/10.1145/1150402.1150522.
  • [33] Malabarba S., Devanbu P., Stearns A.: MoHCA-Java: a tool for C++ to Java conversion support. In: The 21st international conference on Software engineering, pp. 650–653, ACM Press, Los Angeles, 1999. https://doi.org/10.1145/ 302405.302918.
  • [34] Maletic J.I., Collard M.L.: Exploration, analysis, and manipulation of source code using srcML. In: The 37th International Conference on Software Engineering, pp. 951–952, ACM, Florence, 2015. https://dl.acm.org/citation.cfm?id=2819225.
  • [35] Misic M.J., Protic Z.J., Tomasevic M.V.: Improving source code plagiarism detection: lessons learned. In: The 25th Telecommunication Forum, pp. 1–8, IEEE, Belgrade, 2017. https://doi.org/10.1109/TELFOR.2017.8249481.
  • [36] Ohmann T., Rahal I.: Efficient clustering-based source code plagiarism detection using PIY, Knowledge and Information Systems, vol. 43(2), pp. 445–472, 2015. https://doi.org/10.1007/s10115-014-0742-2.
  • [37] Ottenstein K.J.: An algorithmic approach to the detection and prevention of plagiarism, ACM SIGCSE Bulletin, vol. 8(4), pp. 30–41, 1976. https://doi.org/ 10.1145/382222.382462
  • [38] Parr T.: The definitive ANTLR 4 reference, Pragmatic Bookshelf, 2013.
  • [39] Pineiro C., Abuin J.M., Pichel J.C.: Perldoop2: A big data-oriented source-to- -source Perl-Java compiler. In: The 15th International Conference on Dependable, Autonomic and Secure Computing, pp. 933–940, IEEE, Orlando, 2017. https: //doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.156.
  • [40] Poon J.Y.H., Sugiyama K., Tan Y.F., Kan M.Y.: Instructor-centric source code plagiarism detection and plagiarism corpus. In: The 17th ACM Annual Conference on Innovation and Technology in Computer Science Education, pp. 122–127, ACM Press, Haifa, 2012. https://doi.org/10.1145/2325296.2325328.
  • [41] Prechelt L., Malpohl G., Philippsen M.: Finding Plagiarisms among a Set of Programs with JPlag, Journal of Universal Computer Science, vol. 8(11), pp. 1016–1038, 2002. http://dx.doi.org/10.3217/jucs-008-11-1016.
  • [42] Rabbani F.S., Karnalim O.: Detecting source code plagiarism on .NET programming languages using low-level representation and adaptive local alignment, Journal of Information and Organizational Sciences, vol. 41(1), pp. 105–123, 2017. https://doi.org/10.31341/jios.41.1.7.
  • [43] Ragkhitwetsagul C., Krinke J., Clark D.: Similarity of source code in the presence of pervasive modifications. In: The 16th International Working Conference on Source Code Analysis and Manipulation, pp. 117–126, IEEE, Raleigh, 2016. https: //doi.org/10.1109/SCAM.2016.13.
  • [44] Ragkhitwetsagul C., Krinke J., Clark D.: A comparison of code similarity analysers, Empirical Software Engineering, vol. 23(4), pp. 2464–2519, 2018. https://doi.org/10.1007/s10664-017-9564-7.
  • [45] Rosales F., Garcıa A., Rodrıguez S., Pedraza J.L., Mendez R., Nieto M.M.: Detection of plagiarism in programming assignments, IEEE Transactions on Education, vol. 51(2), pp. 174–183, 2008. https://doi.org/10.1109/TE.2007.906778.
  • [46] Sidorov G., Ibarra Romero M., Markov I., Guzman-Cabrera R., Chanona-Hernandez L., Velasquez F.: Measuring similarity between Karel programs using character and word n-grams, Programming and Computer Software, vol. 43(1), pp. 47–50, 2017. https://doi.org/10.1134/S0361768817010066.
  • [47] Simon, Cook B., Sheard J., Carbone A., Johnson C.: Academic integrity: differences between computing assessments and essays. In: The 13th Koli Calling International Conference on Computing Education Research, pp. 23–32, ACM Press, Koli, 2013. https://doi.org/10.1145/2526968.2526971.
  • [48] Song H.J., Park S.B., Park S.Y.: Computation of program source code similarity by composition of parse tree and call graph, Mathematical Problems in Engineering, vol. 2015, pp. 1–12, 2015. https://doi.org/10.1155/2015/429807.
  • [49] Sulistiani L., Karnalim O.: ES-Plag: efficient and sensitive source code plagiarism detection tool for academic environment, Computer Applications in Engineering Education, vol. 27(1), pp. 166–182, 2019. https://doi.org/10.1002/cae.22066.
  • [50] Ullah F., Wang J., Farhan M., Habib M., Khalid S.: Software plagiarism detection in multiprogramming languages using machine learning approach, Concurrency and Computation: Practice and Experience, p. e5000, 2018. https://doi.org/10. 1002/cpe.5000.
  • [51] Ullah F., Wang J., Farhan M., Jabbar S., Wu Z., Khalid S.: Plagiarism detection in students’ programming assignments based on semantics: multimedia e-learning based smart assessment methodology, Multimedia Tools and Applications, 2018. https://doi.org/10.1007/s11042-018-5827-6.
  • [52] Verco K.L., Wise M.J.: Software for detecting suspected plagiarism: comparing structure and attribute-counting systems. In: The 1st Australasian Conference on Computer Science Education, pp. 81–88, ACM Press, Sydney, 1996. https: //doi.org/10.1145/369585.369598.
  • [53] Wang L., Jiang L., Qin G.: A search of verilog code plagiarism detection method. In: The 13th International Conference on Computer Science & Education, pp. 1–5, IEEE, Colombo, 2018. https://doi.org/10.1109/ICCSE.2018.8468817.
  • [54] Wise M.J.: Yap3: improved detection of similarities in computer program and other texts. In: The 27th SIGCSE Technical Symposium on Computer Science Education, vol. 28, pp. 130–134, ACM Press, Philadelphia, 1996. https://doi.org/ 10.1145/236452.236525.
  • [55] Yang F.P., Jiau H.C., Ssu K.F.: Beyond plagiarism: an active learning method to analyze causes behind code-similarity, Computers & Education, vol. 70, pp. 161–172, 2014. https://doi.org/10.1016/J.COMPEDU.2013.08.005.
  • [56] Yasaswi J., Purini S., Jawahar C.V.: Plagiarism detection in programming assignments using deep features. In: The 4th IAPR Asian Conference on Pattern Recognition, pp. 652–657, IEEE, Nanjing, 2017. https://doi.org/10.1109/ACPR. 2017.146.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-ab4021c9-9a90-4af8-93e2-f7ba2a6d3064
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.