Class-based n-gram models of natural language
Computational Linguistics
A maximum entropy approach to natural language processing
Computational Linguistics
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Neural Networks: Tricks of the Trade, this book is an outgrowth of a 1996 NIPS workshop
Advances in Neural Information Processing Systems 5, [NIPS Conference]
IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 2 - Volume 2
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Sequential neural text compression
IEEE Transactions on Neural Networks
Taking on the curse of dimensionality in joint distributions using neural networks
IEEE Transactions on Neural Networks
A Neural Syntactic Language Model
Machine Learning
Training connectionist models for the structured language model
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Self-organizing η-gram model for automatic word spacing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Training neural network language models on very large corpora
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Continuous space language models for statistical machine translation
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Three new graphical models for statistical language modelling
Proceedings of the 24th international conference on Machine learning
Hand gesture recognition and tracking based on distributed locally linear embedding
Image and Vision Computing
Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
ACS'08 Proceedings of the 8th conference on Applied computer scince
A stochastic memoizer for sequence data
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast Evaluation of Connectionist Language Models
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Discriminative learning of selectional preference from unlabeled text
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tied-mixture language modeling in continuous space
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Building a statistical machine translation system for French using the Europarl corpus
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
First steps towards a general purpose French/English statistical machine translation system
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning
Distributional representations for handling sparsity in supervised sequence-labeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Rules and generalization capacity extraction from ANN with GP
IWANN'03 Proceedings of the Artificial and natural neural networks 7th international conference on Computational methods in neural modeling - Volume 1
The adaptive web
Word representations: a simple and general method for semi-supervised learning
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
UCH-UPV English: Spanish system for WMT10
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Exploring representation-learning approaches to domain adaptation
DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Training continuous space language models: some practical issues
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hierarchical Bayesian language models for conversational speech recognition
IEEE Transactions on Audio, Speech, and Language Processing
Communications of the ACM
A neural network for text representation
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Learning word vectors for sentiment analysis
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Temporal restricted Boltzmann machines for dependency parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
A scalable probabilistic classifier for language modeling
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Prosodic and temporal features for language modeling for dialog
Speech Communication
Sentiment classification based on supervised latent n-gram analysis
Proceedings of the 20th ACM international conference on Information and knowledge management
Computational Linguistics
An Artificial Intelligence-based language modeling framework
Expert Systems with Applications: An International Journal
Proceedings of the fifth ACM international conference on Web search and data mining
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
CEU-UPV English-Spanish system for WMT11
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
A hybrid approach to statistical language modeling with multilayer perceptrons and unigrams
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Semi-supervised recursive autoencoders for predicting sentiment distributions
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Efficient subsampling for training complex language models
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
The latent words language model
Computer Speech and Language
A scalable distributed syntactic, semantic, and lexical language model
Computational Linguistics
Lexical surprisal as a general predictor of reading time
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Continuous space translation models with neural networks
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Deep unsupervised feature learning for natural language processing
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Domain and function: a dual-space model of semantic relations and compositions
Journal of Artificial Intelligence Research
Improving word representations via global context and multiple word prototypes
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Measuring the influence of long range dependencies with neural network language models
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Large, pruned or continuous space language models on a GPU for statistical machine translation
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Deep neural network language models
WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
A bottom-up exploration of the dimensions of dialog state in spoken interaction
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Sentiment classification with supervised sequence embedding
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Universal schema for entity type prediction
Proceedings of the 2013 workshop on Automated knowledge base construction
Word classification for sentiment polarity estimation using neural network
HCI International'13 Proceedings of the 15th international conference on Human Interface and the Management of Information: information and interaction design - Volume Part I
Deep learning of representations: looking forward
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Neural network language models for off-line handwriting recognition
Pattern Recognition
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.02 |
A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.