Mandatory Fields

Authors

Alakrot A.;Murray L.;Nikolov N.

Year

2018

Month

January

Journal

Procedia Computer Science

Title

Dataset construction for the detection of anti-social behaviour in online communication in Arabic

Status

Published

Times Cited

()

Optional Fields

Search Keyword

Anti-social behaviour online Arabic dataset harassment detection offensive language detection SVM for offensive language detection in Arabic text mining

Volume

142

Issue

Start Page

315

End Page

320

Abstract

© 2018 The Authors. Published by Elsevier B.V. We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.

Publisher Location

ISBN / ISSN

1877-0509

Edition

URL

DOI Link

10.1016/j.procs.2018.10.491

Grant Details

Funding Body

Grant Details