Tailoring Taxonomies for Efficient Text Categorization and Expert Finding

Authors:
R. Wetzker;W. Umbrath;L. Hennig;C. Bauckhage;T. Alpcan;F. Metze
Affiliations:
-;-;-;-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2008

Citing 9
Cited 0

Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Taxonomies by the numbers: building high-performance taxonomies

Proceedings of the 14th ACM international conference on Information and knowledge management
Acclimatizing Taxonomic Semantics for Hierarchical Content Classification

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The "Spree" Expert Finding System

ICSC '07 Proceedings of the International Conference on Semantic Computing
An unsupervised hierarchical approach to document categorization

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic content categorization by means of taxonomies is a powerful tool for information retrieval and search technologies as it improves the accessibility of data both for humans and machines. While research on automatic categorization has mainly focused on the problem of classifier design, hardly any effort has been spent on the optimization of the taxonomy size itself. However, taxonomy tailoring may significantly improve computational efficiency and scalability of modern retrieval systems where taxonomies often consist of tens of thousands of non-uniformly distributed categories. In this paper we demonstrate empirically that small subtrees of a taxonomy already enable reliable categorization. We compare several measures for the optimal selection of sub-taxonomies and investigate to what ex-tent a reduction affects the classification quality. We consider applications in classical document categorization and in the upcoming area of expert finding and report corresponding results obtained from experiments with standard benchmark data.