Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?

Authors:
Andrew G. West;Sampath Kannan;Insup Lee
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
Proceedings of the Third European Workshop on System Security
Year:
2010

Citing 13
Cited 16

Making large-scale support vector machine learning practical

Advances in kernel methods
The Sybil Attack

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
The Eigentrust algorithm for reputation management in P2P networks

WWW '03 Proceedings of the 12th international conference on World Wide Web
A content-driven reputation system for the wikipedia

Proceedings of the 16th international conference on World Wide Web
Temporal Analysis of the Wikigraph

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Talk Before You Type: Coordination in Wikipedia

HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
Quantitative analysis of thewikipedia community of users

Proceedings of the 2007 international symposium on Wikis
Creating, destroying, and restoring value in wikipedia

Proceedings of the 2007 international ACM conference on Supporting group work
Computing trust from revision history

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Assessing the quality of Wikipedia articles with lifecycle based metrics

Proceedings of the 5th International Symposium on Wikis and Open Collaboration
Automatic vandalism detection in Wikipedia

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Assigning trust to Wikipedia content

WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Detecting spammers with SNARE: spatio-temporal network-level automatic reputation engine

SSYM'09 Proceedings of the 18th conference on USENIX security symposium

Spatio-temporal analysis of Wikipedia metadata and the STiki anti-vandalism tool

Proceedings of the 6th International Symposium on Wikis and Open Collaboration
STiki: an anti-vandalism tool for Wikipedia using spatio-temporal analysis of revision metadata

Proceedings of the 6th International Symposium on Wikis and Open Collaboration
Elusive vandalism detection in wikipedia: a text stability-based approach

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Spam mitigation using spatio-temporal reputations from blacklist history

Proceedings of the 26th Annual Computer Security Applications Conference
Wikipedia vandalism detection

Proceedings of the 20th international conference companion on World wide web
Wikipedia vandalism detection: combining natural language, metadata, and reputation features

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Language of vandalism: improving Wikipedia vandalism detection via stylometric analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Link spamming Wikipedia for profit

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Autonomous link spam detection in purely collaborative environments

Proceedings of the 7th International Symposium on Wikis and Open Collaboration
Trust in collaborative web applications

Future Generation Computer Systems
Coercion or empowerment? Moderation of content in Wikipedia as `essentially contested' bureaucratic rules

Ethics and Information Technology
User edits classification using document revision histories

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Feeling the pulse of a wiki: visualization of recent changes in Wikipedia

Proceedings of the 5th International Symposium on Visual Information Communication and Interaction
Detecting wikipedia vandalism with a contributing efficiency-based approach

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
When the levee breaks: without bots, what happens to Wikipedia's quality control processes?

Proceedings of the 9th International Symposium on Open Collaboration
Automated decision support for human tasks in a collaborative system: the case of deletion in Wikipedia

Proceedings of the 9th International Symposium on Open Collaboration

Quantified Score

Hi-index	0.00

Visualization

Abstract

Blatantly unproductive edits undermine the quality of the collaboratively-edited encyclopedia, Wikipedia. They not only disseminate dishonest and offensive content, but force editors to waste time undoing such acts of vandalism. Language-processing has been applied to combat these malicious edits, but as with email spam, these filters are evadable and computationally complex. Meanwhile, recent research has shown spatial and temporal features effective in mitigating email spam, while being lightweight and robust. In this paper, we leverage the spatio-temporal properties of revision metadata to detect vandalism on Wikipedia. An administrative form of reversion called rollback enables the tagging of malicious edits, which are contrasted with non-offending edits in numerous dimensions. Crucially, none of these features require inspection of the article or revision text. Ultimately, a classifier is produced which flags vandalism at performance comparable to the natural-language efforts we intend to complement (85% accuracy at 50% recall). The classifier is scalable (processing 100+ edits a second) and has been used to locate over 5,000 manually-confirmed incidents of vandalism outside our labeled set.