Feature selection for automatic taxonomy induction

Authors:
Hui Yang;Jamie Callan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 4
Cited 1

Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Learning semantic constraints for the automatic discovery of part-whole relations

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A metric-based framework for automatic taxonomy induction

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1

Topic hierarchy construction for the organization of multi-source user generated contents

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most existing automatic taxonomy induction systems exploit one or more features to induce a taxonomy; nevertheless there is no systematic study examining which are the best features for the task under various conditions. This paper studies the impact of using different features on taxonomy induction for different types of relations and for terms at different abstraction levels. The evaluation shows that different conditions need different technologies or different combination of the technologies. In particular, co-occurrence and lexico-syntactic patterns are good features for is-a, sibling and part-of relations; contextual, co-occurrence, patterns, and syntactic features work well for concrete terms; co-occurrence works well for abstract terms.