Surya Agustian


Plagiarism detection is a complex task. In-text, it should be able to find fragments of a text that is suspected of being illegally plagiarized from other sources. Aligning the plagiarized passages of suspicious documents from the source document is an issue that was discussed a lot, of which we can measure the percentage of the plagiarized text. This research proposes a semantic approach of text (fragments in documents) alignment between source and suspicious documents, using Jackard similarity method. Experimental results on the PAN competition for plagiarism detection competition, yielding average of 66.9% detection scores, increased more than twice if compared to the baseline method provided by the organizer, which is 28,4%. This approach is potential as a starting point to find offset match and length of plagiarized text in a plagiarism detection system.  


plagiarism detection, text alignment, semantic similarity

Full Text:



N. Shivakumar dan H. Garcia-Molina, “SCAM: A Copy Detection Mechanism for Digital Documents,” dalam 2nd International Conference in Theory and Practice of Digital Libraries (DL 1995), Austin, Texas, 1995.

S. Brin, J. Davis dan H. Garcia-Molina, “Copy Detection Mechanism for Digital Documents,” dalam ACM SIGMOD 1995, San Jose, CA, 1995.

S. Schleimer, D. S. Wilkerson dan A. Aiken, “Winnowing: Local Algorithms for Document Fingerprinting,” dalam Proceeding ACM SIGMOD 2003, 2003.

J. Parapar dan A. Barreiro, “Winnowing- Based Text Clustering,” dalam CIKM, Napa Valley, California, 2008.

D. Purwitasari, I. W. S. Priantara dan P. Y. Kusmawan, “The Use of Hartigan Index for Initializing K-Means++ in Detecting Similar Texts of Clustered Documents as a Plagiarism Indicator,” Asian Journal of Information Technology, vol. 10, no. 8-12, pp. 341-347, 2011.

D. Zou, W.-J. Long dan Z. Ling, “Winnowing-Based Similar Text Positioning Method,” dalam International Conference on Internet Technology and Applications, 2010.

D. Zou, W.-J. Long dan Z. Ling, “A Cluster-Based Plagiarism Detection Method,” dalam Lab Report for PAN at CLEF, 2010.

D. Zou, W.-J. Long dan Z. Ling, “A Two- Phase Plagiarism Detection Method,” dalam Internationl Conference on Internet Technology and Applications (iTAP), 2011.

A. Daud, J. A. Khan, J. A. Nasir, R. A. Abbasi, N. R. Aljohani dan J. S. Alowibdi, “Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection,” International Journal on Semantic Web and Information Systems, vol. 14, no. 3, pp. 53-69, 2018.

N. Meuschke, V. Stange, M. Schubotz dan B. Gipp, “HyPlag: A Hybrid Approach to Academic Plagiarism Detection,” dalam SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018.

S. M. Alzahrani, N. Salim dan A. Abraham, “Understanding Plagiarism Linguistic Patterns, Textual Features and Detection Methods,” IEEE Transaction on System, Man and Cybernetics, vol. 42, no. 2, pp. 133-149, 2012.

Z. Ceska, M. Toman dan K. Jezek, “Multilingual Plagiarism Detection,” Lecturer Notes in Computer Science, vol. 5253, pp. 83-92, 2008.

M. Potthast, B. Stein, A. Eiselt, A. Barrón-Cedeño dan P. Rosso, “Overview of the 1st International Competition on Plagiarism Detection,” dalam 3rd Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009) at SEPLN, 2009

M. Potthast, A. Barrón-Cedeño, A. Eiselt, B. Stein dan P. Rosso, “Overview of the 2nd International Competition on Plagiarism Detection,” dalam Working Notes Papers of the CLEF 2010 Evaluation Labs, 2010.

M. Potthast, A. Eiselt, A. Barrón-Cedeño, B. Stein dan P. Rosso, “Overview of the 3rd International Competition on Plagiarism Detection,” dalam Working Notes Papers of the CLEF 2011 Evaluation Labs, 2011.

M. Potthast, M. Hagen, T. Gollub, M. Tippmann, J. Kiesel, P. Rosso, E. Stamatatos dan B. Stein, “Overview of the 5th International Competition on Plagiarism Detection,” dalam 3rd Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2013) at CLEF 2013, 2013

F. Rangel, M. Montes-y-Gómez, M. Potthast dan B. Stein, “Overview of the 6th Author Profiling Task at PAN 2018: Cross-domain Authorship Attribution and Style Change Detection,” dalam CLEF 2018 Evaluation Labs and Workshop – Working Notes Papers, Avignon, France, 2018.

W. Daelemans, M. Kestemont, E. Manjavacas, M. Potthast, F. Rangel, P. Rosso, G. Specht, E. Stamatatos, B. Stein, M. Tschuggnall, M. Wiegmann dan E. Zangerle, “ Overview of PAN 2019: Author Profiling, Celebrity Profiling, Cross-domain Authorship Attribution and Style Change Detection,” dalam 10th International Conference of the CLEF Association (CLEF 2019), 2019.

A. Barrón-Cedeño, M. Potthast, P. Rosso, B. Stein dan A. Eiselt, “Corpus and Evaluation Measures for Automatic Plagiarism Detection,” dalam LREC 2010, Seventh International Conference on Language Resources and Evaluation, 2010.

C. v. Rijsbergen, S. Robertson dan M. Proter, “New models in probabilistic information retrieval,” British Library, London, 1980

M. Potthast, B. Stein, A. Barrón-Cedeño dan P. Rosso, “An Evaluation Framework for Plagiarism Detection,” dalam 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, 2010

DOI: Abstract - 0 PDF - 0


  • There are currently no refbacks.

Copyright (c) 2022 Surya Agustian

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537

Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at

JTI Visitor Counter: View JTI Stats

 Flag Counter