Improving word representations via global context and multiple word prototypes

Authors:
Eric H. Huang;Richard Socher;Christopher D. Manning;Andrew Y. Ng
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Year:
2012

Citing 21
Cited 3

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
WordNet: a lexical database for English

Communications of the ACM
Contextual correlates of synonymy

Communications of the ACM
Concept decompositions for large sparse text data using clustering

Machine Learning
Placing search in context: the concept revisited

Proceedings of the 10th international conference on World Wide Web
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Mixture models of categorization

Journal of Mathematical Psychology
Quantitative evaluation of passage retrieval algorithms for question answering

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A neural probabilistic language model

The Journal of Machine Learning Research
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Three new graphical models for statistical language modelling

Proceedings of the 24th international conference on Machine learning
A unified architecture for natural language processing: deep neural networks with multitask learning

Proceedings of the 25th international conference on Machine learning
Introduction to Information Retrieval

Introduction to Information Retrieval
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A structured vector space model for word meaning in context

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Multi-prototype vector-space models of word meaning

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Measuring distributional similarity in context

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A mixture model with sharing for lexical semantics

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Semi-supervised recursive autoencoders for predicting sentiment distributions

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Multi-space probabilistic sequence modeling

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Context-dependent conceptualization

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Chinese-English mixed text normalization

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models.