Cyberbullying Detection in Roman Urdu Language Using Lexicon Based Approach

  • Kazim Raza Talpur , Siti Sophiayati Yuhaniz , Nilam Nur binti Amir Sjarif , Bandeh Ali

Abstract

Nowadays, online social networks (OSNs) have become integral part of our daily life and
online users of social media are massively growing. The increasing use of OSNs by users leads
to large amount of user communication data. This study focuses on OSNs users who
communicate in Roman Urdu (Urdu language written in English alphabets). Pakistan alone has
over 44 million OSNs users who communicate in Roman Urdu. In this paper, we addressed the
issue of cyberbullying behavior on Twitter platform, where users use Roman Urdu as medium of
their communication. To the best of our knowledge, this is the first study addressing
cyberbullying behavior in Roman Urdu. To address this issue, we developed supervised machine
learning method and proposed a lexicon-based model with set of features derived from Twitter.
An evaluation model shows that the developed model attained results with area under receiver
operating characteristics curve (AUC) of 0.986 and f-measure of 0.984. These results indicate
that the proposed lexicon-based method gives feasible solution for detecting cyberbullying
behavior in Roman Urdu in OSNs. Finally, we compared results achieved with our proposed
lexicon-based method and the results obtained from other well-known models. The comparison
results show the significance of our proposed model

Published
2020-05-10
How to Cite
Kazim Raza Talpur , Siti Sophiayati Yuhaniz , Nilam Nur binti Amir Sjarif , Bandeh Ali. (2020). Cyberbullying Detection in Roman Urdu Language Using Lexicon Based Approach. International Journal of Advanced Science and Technology, 29(10s), 786 - 800. Retrieved from http://sersc.org/journals/index.php/IJAST/article/view/14509
Section
Articles