Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Exploiting syntactic structure for language modeling
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A syntax-based statistical translation model
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Frequency estimates for statistical word similarity measures
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Continuous space language models
Computer Speech and Language
Dependency-Based Construction of Semantic Space Models
Computational Linguistics
Performance prediction for exponential language models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Shrinking exponential language models
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SemEval-2007 task 10: English lexical substitution task
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
FBK-irst: lexical substitution task exploiting domain and syntagmatic coherence
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
KU: word sense disambiguation by substitution
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UNT: SubFinder: combining knowledge sources for automatic lexical substitution
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Learning long-term dependencies with gradient descent is difficult
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
In this paper, we describe a new, publicly available corpus intended to stimulate research into language modeling techniques which are sensitive to overall sentence coherence. The task uses the Scholastic Aptitude Test's sentence completion format. The test set consists of 1040 sentences, each of which is missing a content word. The goal is to select the correct replacement from amongst five alternates. In general, all of the options are syntactically valid, and reasonable with respect to local N-gram statistics. The set was generated by using an N-gram language model to generate a long list of likely words, given the immediate context. These options were then hand-groomed, to identify four decoys which are globally incoherent, yet syntactically correct. To ensure the right to public distribution, all the data is derived from out-of-copyright materials from Project Gutenberg. The test sentences were derived from five of Conan Doyle's Sherlock Holmes novels, and we provide a large set of Nineteenth and early Twentieth Century texts as training material.