Employing EM and Pool-Based Active Learning for Text Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Studying cooperation and conflict between authors with history flow visualizations
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
He says, she says: conflict and coordination in Wikipedia
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Measuring Qualities of Articles Contributed by Online Communities
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Creating, destroying, and restoring value in wikipedia
Proceedings of the 2007 international ACM conference on Supporting group work
Measuring article quality in wikipedia: models and evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
On ranking controversies in wikipedia: models and evaluation
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Can you ever trust a wiki?: impacting perceived trustworthiness in wikipedia
Proceedings of the 2008 ACM conference on Computer supported cooperative work
Modeling trust in collaborative information systems
COLCOM '07 Proceedings of the 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing
Automatic vandalism detection in Wikipedia
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Elusive vandalism detection in wikipedia: a text stability-based approach
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Proceedings of the 20th international conference companion on World wide web
Wikipedia vandalism detection: combining natural language, metadata, and reputation features
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Wikipedia revision toolkit: efficiently accessing Wikipedia's edit history
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Language of vandalism: improving Wikipedia vandalism detection via stylometric analysis
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Proceedings of the 7th International Symposium on Wikis and Open Collaboration
Automatic Assessment of Document Quality in Web Collaborative Digital Libraries
Journal of Data and Information Quality (JDIQ)
How the web can help Wikipedia: a study on information complementation of Wikipedia by the web
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Trust in collaborative web applications
Future Generation Computer Systems
Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Common Sense for Interactive Systems
Detecting wikipedia vandalism with a contributing efficiency-based approach
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
WHAD: Wikipedia historical attributes data
Language Resources and Evaluation
Hi-index | 0.00 |
This paper proposes an active learning approach using language model statistics to detect Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection.