A supervised method of feature weighting for measuring semantic relatedness

Authors:
Alistair Kennedy;Stan Szpakowicz
Affiliations:
SITE, University of Ottawa, Ottawa, Ontario, Canada;SITE, University of Ottawa, Ottawa, Ontario, Canada and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
Venue:
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Year:
2011

Citing 17
Cited 1

A cluster-based approach to thesaurus construction

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Detection of Thesaurus relations for Information Retrieval Applications

Foundations of Computer Science: Potential - Theory - Cognition, to Wilfried Brauer on the occasion of his sixtieth birthday
Query type classification for web document retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Clustering by committee

Clustering by committee
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics
Improvements in automatic thesaurus extraction

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Semantic taxonomy induction from heterogenous evidence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic thesaurus construction

ACSC '08 Proceedings of the thirty-first Australasian conference on Computer science - Volume 74
Context Sensitive Paraphrasing with a Global Unsupervised Classifier

ECML '07 Proceedings of the 18th European conference on Machine Learning
An efficient algorithm for building a distributional thesaurus (and other Sketch Engine developments)

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Learning term-weighting functions for similarity measures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Adaptive near-duplicate detection via similarity learning

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research

Toward advice mining: conditional random fields for extracting advice-revealing text units

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the most popular measures is Pointwise Mutual Information. It increases the weight of contexts where a word appears regularly but other words do not, and decreases the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. We present a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses Pointwise Mutual Information to weight these contexts on how well they indicate closely related words. We use Roget's Thesaurus as a source of training and evaluation data. This work is as a step towards adding new terms to Roget's Thesaurus automatically, and doing so with high confidence.