A Noun-Predicate Bigram-Based Similarity Measure for Lexical Relations

Authors:
Hyopil Shin;Insik Cho
Affiliations:
Computational Linguistics Lab., Dept. of Linguistics, Seoul National University, Seoul, Korea;Computational Linguistics Lab., Dept. of Linguistics, Seoul National University, Seoul, Korea
Venue:
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Year:
2008

Citing 9
Cited 0

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Training and scaling preference functions for disambiguation

Computational Linguistics
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Idiomatic object usage and support verbs

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Similarity of Semantic Relations

Computational Linguistics
Making senses: bootstrapping sense-tagged lists of semantically-related words

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The method outlined in this paper demonstrates that the information-theoretic similarity measure and noun-predicate bigrams are effective methods for creating lists of semantically-related words for lexical database work. Our experiments revealed that instead of serious syntactic analysis, bigrams and morpho-syntactic information sufficed for the feature-based similarity measure. We contend that our method would be even more appreciated if it applied to a raw newswire corpus in which unlisted words in existing dictionaries, such as recently-created words, proper nouns, and syllabic abbreviations, are prevailing.