Generating better concept hierarchies using automatic document classification

Authors:
Razvan Stefan Bot;Yi-fang Brook Wu;Xin Chen;Quanzhi Li
Affiliations:
New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ;New Jersey Institute of Technology, Newark, NJ
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 2
Cited 1

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval

Automatic classification of Web queries using very large unlabeled query logs

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a hybrid concept hierarchy development technique for web returned documents retrieved by a meta-search engine. The aim of the technique is to separate the initial retrieved documents into topical oriented categories, prior to the actual concept hierarchy generation. The topical categories correspond to different semantic aspects of the query. This is done using a 1-of-n automatic document classification, on the initial set of returned documents. Then, an individual topical concept hierarchy is automatically generated inside each of the resulted categories. Both steps are executed on the fly at retrieval time. Due to the efficiency constraints imposed by the web retrieval context, the algorithm only uses document snippets (rather than full web pages) for both document classification and concept hierarchy generation. Experimental results show that the algorithm is able to improve the quality of the concept hierarchy presented to the searcher; at the same time, the efficiency parameters are kept within reasonable intervals.