Hypernym discovery based on distributional similarity and hierarchical structures

Authors:
Ichiro Yamada;Kentaro Torisawa;Jun'ichi Kazama;Kow Kuroda;Masaki Murata;Stijn De Saeger;Francis Bond;Asuka Sumida
Affiliations:
National Institute of Information and Communications Technology, Keihannna Science City, Japan;National Institute of Information and Communications Technology, Keihannna Science City, Japan;National Institute of Information and Communications Technology, Keihannna Science City, Japan;National Institute of Information and Communications Technology, Keihannna Science City, Japan;National Institute of Information and Communications Technology, Keihannna Science City, Japan;National Institute of Information and Communications Technology, Keihannna Science City, Japan;National Institute of Information and Communications Technology, Keihannna Science City, Japan;Japan Advanced Institute of Science and Technology, Nomi-shi, Ishikawa-ken, Japan
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 10
Cited 3

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing a semantically annotated lexicon via EM-based clustering

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Semantic taxonomy induction from heterogenous evidence

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Towards terascale knowledge acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
TORISHIKI-KAI, An Autogenerated Web Search Directory

ISUC '08 Proceedings of the 2008 Second International Symposium on Universal Communication
Deriving a large scale taxonomy from Wikipedia

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence

Co-related verb argument selectional preferences

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Automatic acquisition of taxonomies in different languages from multiple Wikipedia versions

i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
Supporting resource-based learning on the web using automatically extracted large-scale taxonomies from multiple wikipedia versions

ICWL'11 Proceedings of the 10th international conference on Advances in Web-Based Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new method of developing a large-scale hyponymy relation database by combining Wikipedia and other Web documents. We attach new words to the hyponymy database extracted from Wikipedia by using distributional similarity calculated from documents on the Web. For a given target word, our algorithm first finds k similar words from the Wikipedia database. Then, the hypernyms of these k similar words are assigned scores by considering the distributional similarities and hierarchical distances in the Wikipedia database. Finally, new hyponymy relations are output according to the scores. In this paper, we tested two distributional similarities. One is based on raw verb-noun dependencies (which we call "RVD"), and the other is based on a large-scale clustering of verb-noun dependencies (called "CVD"). Our method achieved an attachment accuracy of 91.0% for the top 10,000 relations, and an attachment accuracy of 74.5% for the top 100,000 relations when using CVD. This was a far better outcome compared to the other baseline approaches. Excluding the region that had very high scores, CVD was found to be more effective than RVD. We also confirmed that most relations extracted by our method cannot be extracted merely by applying the well-known lexico-syntactic patterns to Web documents.