Machine Learning
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Edit Distance with Move Operations
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Studying cooperation and conflict between authors with history flow visualizations
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
He says, she says: conflict and coordination in Wikipedia
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Mining wikipedia revision histories for improving sentence compression
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Choosing the right translation: a syntactically informed classification approach
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Paraphrase recognition using machine learning to combine similarity measures
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata?
Proceedings of the Third European Workshop on System Security
For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Learning to simplify sentences with quasi-synchronous grammar and integer programming
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Term weighting based on document revision history
Journal of the American Society for Information Science and Technology
CoSyne: synchronizing multilingual wiki content
Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
A new data hiding method via revision history records on collaborative writing platforms
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Hi-index | 0.00 |
Document revision histories are a useful and abundant source of data for natural language processing, but selecting relevant data for the task at hand is not trivial. In this paper we introduce a scalable approach for automatically distinguishing between factual and fluency edits in document revision histories. The approach is based on supervised machine learning using language model probabilities, string similarity measured over different representations of user edits, comparison of part-of-speech tags and named entities, and a set of adaptive features extracted from large amounts of unlabeled user edits. Applied to contiguous edit segments, our method achieves statistically significant improvements over a simple yet effective edit-distance baseline. It reaches high classification accuracy (88%) and is shown to generalize to additional sets of unseen data.