Extracting hyponyms of prespecified hypernyms from itemizations and headings in web documents

Authors:
Keiji Shinzato;Kentaro Torisawa
Affiliations:
Japan Advanced Institute of Science and Technology, Ishikawa, Japan;Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 4
Cited 3

A hybrid Japanese parser with hand-crafted grammar and statistics

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Offline strategies for online question answering: answering questions before they are asked

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Mining web snippets to answer list questions

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Concept-Based, Personalized Web Information Gathering: A Survey

KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Co-STAR: a co-training style algorithm for hyponymy relation acquisition from structured and unstructured text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method to acquire hyponyms for given hypernyms from HTML documents on the WWW. We assume that a heading (or explanation) of an itemization (or listing) in an HTML document is likely to contain a hypernym of the items in the itemization, and we try to acquire hyponymy relations based on this assumption. Our method is obtained by extending Shinzato's method (Shinzato and Torisawa, 2004) where a common hypernym for expressions in itemizations in HTML documents is obtained by using statistical measures. By using Japanese HTML documents, we empirically show that our proposed method can obtain a significant number of hyponymy relations which would otherwise be missed by alternative methods.