Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Elements of information theory
Elements of information theory
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Class-based n-gram models of natural language
Computational Linguistics
Computational Linguistics
Selection and information: a class-based approach to lexical relationships
Selection and information: a class-based approach to lexical relationships
Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity-Based Models of Word Cooccurrence Probabilities
Machine Learning - Special issue on natural language learning
A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Class-based probability estimation using a semantic hierarchy
Computational Linguistics
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
Automatic word sense discrimination
Computational Linguistics - Special issue on word sense disambiguation
Generalizing case frames using a thesaurus and the MDL principle
Computational Linguistics
Word clustering and disambiguation based on co-occurrence data
Natural Language Engineering
Word-for-word glossing with contextually similar words
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A classification approach to word prediction
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Using syntactic dependency as local context to resolve word sense ambiguity
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Contextual word similarity and estimation from sparse data
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
A class-based probabilistic approach to structural disambiguation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing a semantically annotated lexicon via EM-based clustering
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using a semantic concordance for sense identification
HLT '94 Proceedings of the workshop on Human Language Technology
Improvements in automatic thesaurus extraction
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
A general framework for distributional similarity
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Finding predominant word senses in untagged text
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Feature vector quality and distributional similarity
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Characterising measures of lexical distributional similarity
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Identifying synonyms among distributionally similar words
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Using measures of semantic relatedness for word sense disambiguation
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
Computational Linguistics
Scaling distributional similarity to large corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A context-sensitive framework for lexical ontologies
The Knowledge Engineering Review
Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Learning semantic relatedness from term discrimination information
Expert Systems with Applications: An International Journal
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Rank-Based Transformation in Measuring Semantic Relatedness
Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Measuring topic homogeneity and its application to dictionary-based word sense disambiguation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using web-search results to measure word-group similarity
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Semantic classification with distributional kernels
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Discriminative learning of selectional preference from unlabeled text
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Bootstrapping distributional feature vector quality
Computational Linguistics
Deriving a large scale taxonomy from Wikipedia
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Knowledge derived from wikipedia for computing semantic relatedness
Journal of Artificial Intelligence Research
Graph-based clustering for semantic classification of onomatopoetic words
TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Automatically creating datasets for measures of semantic relatedness
LD '06 Proceedings of the Workshop on Linguistic Distances
Weakly supervised techniques for domain-independent sentiment classification
Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
Relieving Polysemy Problem for Synonymy Detection
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Using distributional similarity to identify individual verb choice
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
A comparison of co-occurrence and similarity measures as simulations of context
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Bootstrapping semantic analyzers from non-contradictory texts
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Parsing and real-world applications
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Proceedings of the 15th international conference on Security protocols
Taxonomy induction based on a collaboratively built knowledge repository
Artificial Intelligence
Creative language retrieval: a robust hybrid of information retrieval and linguistic creativity
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A supervised method of feature weighting for measuring semantic relatedness
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
A domain-independent approach to finding related entities
Information Processing and Management: an International Journal
Lexical acquisition for clinical text mining using distributional similarity
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
A semi-supervised approach to extracting multiword entity names from user reviews
Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
A first approach to CLIR using character n-grams alignment
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Using Patterns Co-occurrence Matrix for Cleaning Closed Sequential Patterns for Text Mining
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Predicting part-of-speech tags and morpho-syntactic relations using similarity-based technique
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Large, huge or gigantic? Identifying and encoding intensity relations among adjectives in WordNet
Language Resources and Evaluation
Hi-index | 0.00 |
Techniques that exploit knowledge of distributional similarity between words have been proposed in many areas of Natural Language Processing. For example, in language modeling, the sparse data problem can be alleviated by estimating the probabilities of unseen co-occurrences of events from the probabilities of seen co-occurrences of similar events. In other applications, distributional similarity is taken to be an approximation to semantic similarity. However, due to the wide range of potential applications and the lack of a strict definition of the concept of distributional similarity, many methods of calculating distributional similarity have been proposed or adopted.In this work, a flexible, parameterized framework for calculating distributional similarity is proposed. Within this framework, the problem of finding distributionally similar words is cast as one of co-occurrence retrieval (CR) for which precision and recall can be measured by analogy with the way they are measured in document retrieval. As will be shown, a number of popular existing measures of distributional similarity are simulated with parameter settings within the CR framework. In this article, the CR framework is then used to systematically investigate three fundamental questions concerning distributional similarity. First, is the relationship of lexical similarity necessarily symmetric, or are there advantages to be gained from considering it as an asymmetric relationship? Second, are some co-occurrences inherently more salient than others in the calculation of distributional similarity? Third, is it necessary to consider the difference in the extent to which each word occurs in each co-occurrence type?Two application-based tasks are used for evaluation: automatic thesaurus generation and pseudo-disambiguation. It is possible to achieve significantly better results on both these tasks by varying the parameters within the CR framework rather than using other existing distributional similarity measures; it will also be shown that any single unparameterized measure is unlikely to be able to do better on both tasks. This is due to an inherent asymmetry in lexical substitutability and therefore also in lexical distributional similarity.