Lexical Semantic in Arabic Plagiarism Detection Using Winnowing Algorithm (Word Level)

  • Zahraa Jasim Jabir, Ahmed H. Aliwy

Abstract

The Plagiarism is an illegal electronic crime for violating the rights of authors and publishers. In the indexing process, fingerprint algorithms are used. We extracted fingerprints for every five words by hash function. The winnowing algorithm was used on words instead of letters for selecting the fingerprint and to reduce the size of the index. Winnowing algorithm used with synonym replacement and without synonym replacement to compute the percentage of grams of plagiarized texts, done in two ways: (i) percent of plagiarism in suspicious document (a file in dataset) and (ii) percent of plagiarism in a file in dataset (to suspicious file). And then the Precision, Recall, F-measure and Error rate are estimated. The results for winnowing algorithm (with synonyms replacement) in the first methodology are (precision, recall, f-measure and error rate) were (0.84861, 1, 0.889206 and 0.034461) respectively, and in the second methodology are (0.852961, 1, 0.892954 and 0.030137) for (precision, recall, f-measure and error rate) respectively. The results for the winnowing algorithm (without synonym replacement) in the first methodology are (0.849207, 1, 0.889764 and 0.034452) and in the second methodology are (0.852554692, 1, 0.892920502 and 0.030153138) for (precision, recall, f-measure and error rate) respectively.

 Keywords: Plagiarism detection, Arabic text, Winnowing, Fingerprinting

Published
2019-10-29
How to Cite
Ahmed H. Aliwy, Z. J. J. (2019). Lexical Semantic in Arabic Plagiarism Detection Using Winnowing Algorithm (Word Level). International Journal of Advanced Science and Technology, 28(12), 16 - 24. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/1181
Section
Articles