The University of Massachusetts Amherst

Conversational Toxic Comment Classification on Wikipedia Talk Pages

Wednesday, May 4, 2022


Amir Alvandi, Statistics Ph.D candidate at Umass Amherst, has completed a study on Conversational Toxic Comment Classification for DACSS 697 D Text-as Data this spring.

With this project, Amir aimed to detect harmful content on online platforms and provide practical tools to help improve the quality of conversations.  He analyzed a Wikipedia Comment Dataset containing 159,563 user comments collected from Wikipedia talk pages annotated by human raters from Kaggle’s Toxic Comment Classification challenge. 

“Conversational toxicity is a growing issue that can lead people to stop genuinely expressing themselves and give up on seeking others’ opinions out of fear or abuse or harassment. Due to the massive volume of comments, it has become more critical to find a practical solution to efficiently identify and classify toxic comments,” Amir noted as his motivation to perform this study.

Click here to access the poster.