Automatic Hate Speech Detection on Gujarati Language Using Machine Learning

Main Article Content

Abhilasha Vadesara, Purna Tanna

Abstract

Any communication that disparages a target group of people on the basis of a trait like race, color, gender, sexual orientation, ethnicity, nationality or other characteristic is usually referred to as hate speech. There is a steady growth in hate speech as a result of the immense rise in user-generated online content on social media. Along with the phenomenon's growing social effect, interest in online hate speech detection and, in particular, the automation of this task has developed over the past several years. Identification and monitoring of hate speech is becoming an increasingly difficult issue for individuals and society. The objective of this paper is to identify hate speech detection using Natural language processing and Machine learning classifier on gujarati language. This paper compares the four different classifiers like SVM, Naïve bayes, Decision tree and logistic regression with Bag of Word and TF-IDF feature extraction technique. The proposed system pre-preprocesses the twelve thousand tweets and then extract the important features using feature extraction technique to classify into hate and none-hate category using machine learning classifier. Among all classifier naïve bayes classifier and bag of word technique achieved the highest F1-score 91% of hate category and 87.54% accuracy for whole Gujarati corpus including hate and none hate.

Article Details

Section
Articles