Mining semantic distance between corpus terms

Authors:
Ahmad El Sayed;Hakim Hacid;Djamel Zighed
Affiliations:
Université Lyon 2, Lyon, France;Université Lyon 2, Lyon, France;Université Lyon 2, Lyon, France
Venue:
Proceedings of the ACM first Ph.D. workshop in CIKM
Year:
2007

Citing 9
Cited 1

WordNet: a lexical database for English

Communications of the ACM
Foundations of statistical natural language processing

Foundations of statistical natural language processing
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics

Semantics Discovery via Human Computation Games

International Journal on Semantic Web & Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we face two problems in classical semantic similarity measures. Firstly, the context-dependency problem in knowledge-base measures since no one takes into account the context of the target domain. That is, a multisource context-dependent approach is presented. Secondly, the coverage problem with these measures since similarities can only be calculated between concepts included in a taxonomy. Moreover, "pure" corpus-based measures are still way from achieving performance reached by knowledge based measures. We present a more complex corpus-based approach using a taxonomy and data mining techniques in order to compute semantic distances between terms uncovered by the taxonomy. Experiments made show clearly the effectiveness of both proposed approaches.