Web-scale N-gram models for lexical disambiguation

Authors:
Shane Bergsma;Dekang Lin;Randy Goebel
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Google, Inc., Mountain View, California;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Venue:
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Year:
2009

Citing 14
Cited 22

Learning to resolve natural language ambiguities: a unified approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Scaling Up Context-Sensitive Text Correction

Proceedings of the Thirteenth Conference on Innovative Applications of Artificial Intelligence Conference
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A method for word sense disambiguation of unrestricted text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Googleology is Bad Science

Computational Linguistics
The ups and downs of preposition error detection in ESL writing

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
SemEval-2007 task 06: word-sense disambiguation of prepositions

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Detection of grammatical errors involving prepositions

SigSem '07 Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions
Automatically acquiring models of preposition use

SigSem '07 Proceedings of the Fourth ACL-SIGSEM Workshop on Prepositions
Real-word spelling correction with trigrams: a reconsideration of the Mays, Damerau, and Mercer model

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing

Exploring web scale language models for search query processing

Proceedings of the 19th international conference on World wide web
Linguistic steganography using automatically generated paraphrases

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Creating robust supervised classifiers via web-scale N-gram data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Search right and thou shalt find...: using web queries for learner error detection

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Improved natural language learning via variance-regularization support vector machines

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Exploring the data-driven prediction of prepositions in English

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Web-based and combined language models: a case study on noun compound identification

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Grammatical error correction with alternating structure optimization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Algorithm selection and model adaptation for ESL correction tasks

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Exploiting syntactic and distributional information for spelling correction with web-scale n-gram models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NADA: a robust system for non-referential pronoun detection

DAARC'11 Proceedings of the 8th international conference on Anaphora Processing and Applications
Data-driven correction of function words in non-native English

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Detecting dependency parse errors with minimal resources

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Unsupervised learning on an approximate corpus

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
On using context for automatic correction of non-word misspellings in student essays

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Informing determiner and preposition error correction with word clusters

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
NUS at the HOO 2012 shared task

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Precision isn't everything: a hybrid approach to grammatical error detection

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
A beam-search decoder for grammatical error correction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Rel-grams: a probabilistic model of relations in text

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Detection of semantic errors in Arabic texts

Artificial Intelligence
Unsupervised word sense disambiguation with N-gram features

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web-scale data has been used in a diverse range of language research. Most of this research has used web counts for only short, fixed spans of context. We present a unified view of using web counts for lexical disambiguation. Unlike previous approaches, our supervised and unsupervised systems combine information from multiple and overlapping segments of context. On the tasks of preposition selection and context-sensitive spelling correction, the supervised system reduces disambiguation error by 20-24% over the current state-of-the-art.