A maximum entropy approach to natural language processing
Computational Linguistics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Cohesive Generation of Syntactically Simplified Newspaper Text
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ACM SIGIR Forum
Helping aphasic people process online information
Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility
NLTK: the natural language toolkit
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Unsupervised Multilingual Sentence Boundary Detection
Computational Linguistics
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Confidence-weighted linear classification
Proceedings of the 25th international conference on Machine learning
BART: a modular toolkit for coreference resolution
HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Cognitively motivated features for readability assessment
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Self-training PCFG grammars with latent annotations across languages
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A joint language model with fine-grain syntactic tags
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Automatic content-based categorization of Wikipedia articles
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Extracting parallel sentences from comparable corpora using document level alignment
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Putting it simply: a context-aware approach to lexical simplification
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Simple English Wikipedia: a new text simplification task
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning to simplify sentences using Wikipedia
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Towards an on-demand simple Portuguese Wikipedia
SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies
Hi-index | 0.00 |
Text simplification is the process of changing vocabulary and grammatical structure to create a more accessible version of the text while maintaining the underlying information and content. Automated tools for text simplification are a practical way to make large corpora of text accessible to a wider audience lacking high levels of fluency in the corpus language. In this work, we investigate the potential of Simple Wikipedia to assist automatic text simplification by building a statistical classification system that discriminates simple English from ordinary English. Most text simplification systems are based on hand-written rules (e.g., PEST (Carroll et al., 1999) and its module SYSTAR (Canning et al., 2000)), and therefore face limitations scaling and transferring across domains. The potential for using Simple Wikipedia for text simplification is significant; it contains nearly 60,000 articles with revision histories and aligned articles to ordinary English Wikipedia. Using articles from Simple Wikipedia and ordinary Wikipedia, we evaluated different classifiers and feature sets to identify the most discriminative features of simple English for use across domains. These findings help further understanding of what makes text simple and can be applied as a tool to help writers craft simple text.