Similarity-Based Models of Word Cooccurrence Probabilities

Authors:
Ido Dagan;Lillian Lee;Fernando C. N. Pereira
Affiliations:
Dept. of Mathematics and Computer Science, Bar Ilan University, Ramat Gan 52900, Israel. dagan@macs.biu.ac.il;Department of Computer Science, Cornell University, Ithaca, NY 14853, USA. llee@cs.cornell.edu;AT&T Labs—Research, 180 Park Ave., Florham Park, NJ 07932, USA. pereira@research.att.com
Venue:
Machine Learning - Special issue on natural language learning
Year:
1999

Citing 28
Cited 84

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Discovery procedures for sublanguage selectional patterns: initial experiments

Computational Linguistics
Instance-Based Learning Algorithms

Machine Learning
Experience with a stack decoder-based HMM CSR and back-OFF N-gram language models

HLT '91 Proceedings of the workshop on Speech and Natural Language
Elements of information theory

Elements of information theory
Use of syntactic context to produce term association lists for text retrieval

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Experiment on linguistically-based term associations

Information Processing and Management: an International Journal
Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Class-based n-gram models of natural language

Computational Linguistics
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Similarity-based approaches to natural language processing

Similarity-based approaches to natural language processing
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Word Space

Advances in Neural Information Processing Systems 5, [NIPS Conference]
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Using syntactic dependency as local context to resolve word sense ambiguity

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Memory-based learning: using similarity for smoothing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Contextual word similarity and estimation from sparse data

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical sense disambiguation with relatively small corpora using dictionary definitions

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Similarity-based estimation of word cooccurrence probabilities

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Hierarchical clustering of words

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Smoothing of automatically generated selectional constraints

HLT '93 Proceedings of the workshop on Human Language Technology

An information-theoretic approach to automatic query expansion

ACM Transactions on Information Systems (TOIS)
Collocation Dictionary Optimization Using WordNetand k-Nearest Neighbor Learning

Machine Translation
The disambiguation of nominalizations

Computational Linguistics
Selection Restrictions Acquisition from Corpora

EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
Assessment of Selection Restrictions Acquisition

SBIA '02 Proceedings of the 16th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Self Organizing Map and Sammon Mapping for Asymmetric Proximities

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
A classification approach to word prediction

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Determinants of adjective-noun plausibility

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Learning random walk models for inducing word dependency distributions

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Distributional similarity models: clustering vs. nearest neighbors

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing a semantically annotated lexicon via EM-based clustering

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A comparison of parsing technologies for the biomedical domain

Natural Language Engineering
A comparative evaluation of data-driven models in translation selection of machine translation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Evaluating smoothing algorithms against plausibility judgements

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Frequency estimates for statistical word similarity measures

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Dimension-reduced estimation of word co-occurrence probability

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics
Estimating satisfactoriness of selectional restriction from corpus without a thesaurus

ACM Transactions on Asian Language Information Processing (TALIP)
Using the web to overcome data sparseness

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
On the robustness of entropy-based similarity measures in evaluation of subcategorization acquisition systems

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Clustering Syntactic Positions with Similar Semantic Requirements

Computational Linguistics
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Experiments on the Automatic Induction of German Semantic Verb Classes

Computational Linguistics
Automated extraction of Tree-Adjoining Grammars from treebanks

Natural Language Engineering
An empirical study on language model adaptation

ACM Transactions on Asian Language Information Processing (TALIP)
Feature vector quality and distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Fast computation of lexical affinity models

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Acquisition of verb entailment from text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An information-theoretic approach to automatic evaluation of summaries

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Word Sense Disambiguation by Machine Learning Approach: A Short Survey

Fundamenta Informaticae - Contagious Creativity - In Honor of the 80th Birthday of Professor Solomon Marcus
Discovery of event entailment knowledge from text corpora

Computer Speech and Language
Tagging over time: real-world image annotation by lightweight meta-learning

Proceedings of the 15th international conference on Multimedia
Finding translations for low-frequency words in comparable corpora

Machine Translation
Similarity based smoothing in language modeling

Acta Cybernetica
Applications of corpus-based semantic similarity and word segmentation to database schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Learning semantic relatedness from term discrimination information

Expert Systems with Applications: An International Journal
Extracting Dependency Trees from Sanskrit Texts

Proceedings of the 3rd International Symposium on Sanskrit Computational Linguistics
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Semantic classification with distributional kernels

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Discriminative learning of selectional preference from unlabeled text

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bootstrapping distributional feature vector quality

Computational Linguistics
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Sequence prediction exploiting similarity information

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Graph-based clustering for semantic classification of onomatopoetic words

TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
An extensive empirical study of collocation extraction methods

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Combining syntactic co-occurrences and nearest neighbours in distributional methods to remedy data sparseness

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Weakly supervised techniques for domain-independent sentiment classification

Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
Strictly lexical dependency parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Estimating semantic distance using soft semantic constraints in knowledge-source-corpus hybrid models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Discriminative training of clustering functions: theory and experiments with entity identification

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Classifying Japanese polysemous verbs based on fuzzy C-means clustering

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Using Kullback-Leibler distance for text categorization

ECIR'03 Proceedings of the 25th European conference on IR research
A mathematical model for context and word-meaning

CONTEXT'03 Proceedings of the 4th international and interdisciplinary conference on Modeling and using context
Selection restrictions acquisition for parsing improvement

INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
Similarity computation of low-frequency Chinese words

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Finding similar defects using synonymous identifier retrieval

Proceedings of the 4th International Workshop on Software Clones
Discrete visual features modeling via leave-one-out likelihood estimation and applications

Journal of Visual Communication and Image Representation
A Bayesian method for robust estimation of distributional similarities

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improving the use of pseudo-words for evaluating selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Directional distributional similarity for lexical inference

Natural Language Engineering
A flexible, corpus-driven model of regular and inverse selectional preferences

Computational Linguistics
A word at a time: computing word relatedness using temporal semantic analysis

Proceedings of the 20th international conference on World wide web
Polysemous verb classification using subcategorization acquisition and graph-based clustering

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)
Half-context language models

Computational Linguistics
Text classification using small number of features

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
A similarity-based approach to data sparseness problem of chinese language modeling

MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Corpus-based analysis of japanese relative clause constructions

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
An empirical study on language model adaptation using a metric of domain similarity

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Large-scale learning of word relatedness with constraints

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring the dynamic relatedness between chinese entities orienting to news corpus

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Word Sense Disambiguation by Machine Learning Approach: A Short Survey

Fundamenta Informaticae - Contagious Creativity - In Honor of the 80th Birthday of Professor Solomon Marcus
Open language learning for information extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A multisource context-dependent semantic distance between concepts

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Supervised learning of semantic relatedness

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Distributional phrasal paraphrase generation for statistical machine translation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Recall oriented search on the web using semantic annotations

Proceedings of the sixth international workshop on Exploiting semantic annotations in information retrieval
Predicting part-of-speech tags and morpho-syntactic relations using similarity-based technique

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications of natural language processing (NLP) itis necessary to determine the likelihood of a given word combination.For example, a speech recognizer may need to determine which of thetwo word combinations “eat a peach” and ”eat a beach” is morelikely. Statistical NLP methods determine the likelihood of a wordcombination from its frequency in a training corpus. However, thenature of language is such that many word combinations are infrequentand do not occur in any given corpus. In this work we propose amethod for estimating the probability of such previously unseen wordcombinations using available information on “most similar” words.We describe probabilistic word association models based ondistributional word similarity, and apply them to two tasks, languagemodeling and pseudo-word disambiguation. In the language modelingtask, a similarity-based model is used to improve probabilityestimates for unseen bigrams in a back-off language model. Thesimilarity-based method yields a 20% perplexity improvement in theprediction of unseen bigrams and statistically significant reductionsin speech-recognition error.We also compare four similarity-based estimation methods againstback-off and maximum-likelihood estimation methods on a pseudo-wordsense disambiguation task in which we controlled for both unigram andbigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task.