A technique for isolating differences between files
Communications of the ACM
Machine Learning
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Discriminative syntactic language modeling for speech recognition
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Does it matter who contributes: a study on featured articles in the german wikipedia
Proceedings of the eighteenth conference on Hypertext and hypermedia
Cooperation and quality in wikipedia
Proceedings of the 2007 international symposium on Wikis
Using dynamic markov compression to detect vandalism in the wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The work of sustaining order in wikipedia: the banning of a vandal
Proceedings of the 2010 ACM conference on Computer supported cooperative work
Automatic vandalism detection in Wikipedia
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Measuring author contributions to the Wikipedia
WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Providing cross-lingual editing assistance to Wikipedia editors
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Language of vandalism: improving Wikipedia vandalism detection via stylometric analysis
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Punctuation: making a point in unsupervised dependency parsing
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Gender attribution: tracing stylometric evidence beyond topic and genre
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Trust in collaborative web applications
Future Generation Computer Systems
Historical analysis of legal opinions with a sparse mixed-effects latent variable model
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Feeling the pulse of a wiki: visualization of recent changes in Wikipedia
Proceedings of the 5th International Symposium on Visual Information Communication and Interaction
Proceedings of the 9th International Symposium on Open Collaboration
Hi-index | 0.00 |
Discriminating vandalism edits from non-vandalism edits in Wikipedia is a challenging task, as ill-intentioned edits can include a variety of content and be expressed in many different forms and styles. Previous studies are limited to rule-based methods and learning based on lexical features, lacking in linguistic analysis. In this paper, we propose a novel Web-based shallow syntactic-semantic modeling method, which utilizes Web search results as resource and trains topic-specific n-tag and syntactic n-gram language models to detect vandalism. By combining basic task-specific and lexical features, we have achieved high F-measures using logistic boosting and logistic model trees classifiers, surpassing the results reported by major Wikipedia vandalism detection systems.