Elusive vandalism detection in wikipedia: a text stability-based approach

Authors:
Qinyi Wu;Danesh Irani;Calton Pu;Lakshmish Ramaswamy
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;University of Georgia, Athens, GA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 8
Cited 2

Studying cooperation and conflict between authors with history flow visualizations

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A content-driven reputation system for the wikipedia

Proceedings of the 16th international conference on World Wide Web
Creating, destroying, and restoring value in wikipedia

Proceedings of the 2007 international ACM conference on Supporting group work
A jury of your peers: quality, experience and ownership in Wikipedia

Proceedings of the 5th International Symposium on Wikis and Open Collaboration
Assessing the quality of Wikipedia articles with lifecycle based metrics

Proceedings of the 5th International Symposium on Wikis and Open Collaboration
Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?

Proceedings of the Third European Workshop on System Security
Detecting Wikipedia vandalism with active learning and statistical language models

Proceedings of the 4th workshop on Information credibility
Automatic vandalism detection in Wikipedia

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Detecting wikipedia vandalism with a contributing efficiency-based approach

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
WHAD: Wikipedia historical attributes data

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The open collaborative nature of wikis encourages participation of all users, but at the same time exposes their content to vandalism. The current vandalism-detection techniques, while effective against relatively obvious vandalism edits, prove to be inadequate in detecting increasingly prevalent sophisticated (or elusive) vandal edits. We identify a number of vandal edits that can take hours, even days, to correct and propose a text stability-based approach for detecting them. Our approach is focused on the likelihood of a certain part of an article being modified by a regular edit. In addition to text-stability, our machine learning-based technique also takes into account edit patterns. We evaluate the performance of our approach on a corpus comprising of 15000 manually labeled edits from the Wikipedia Vandalism PAN corpus. The experimental results show that text-stability is able to improve the performance of the selected machine-learning algorithms significantly.