Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Taxonomies by the numbers: building high-performance taxonomies
Proceedings of the 14th ACM international conference on Information and knowledge management
Acclimatizing Taxonomic Semantics for Hierarchical Content Classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The "Spree" Expert Finding System
ICSC '07 Proceedings of the International Conference on Semantic Computing
An unsupervised hierarchical approach to document categorization
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Hi-index | 0.00 |
Automatic content categorization by means of taxonomies is a powerful tool for information retrieval and search technologies as it improves the accessibility of data both for humans and machines. While research on automatic categorization has mainly focused on the problem of classifier design, hardly any effort has been spent on the optimization of the taxonomy size itself. However, taxonomy tailoring may significantly improve computational efficiency and scalability of modern retrieval systems where taxonomies often consist of tens of thousands of non-uniformly distributed categories. In this paper we demonstrate empirically that small subtrees of a taxonomy already enable reliable categorization. We compare several measures for the optimal selection of sub-taxonomies and investigate to what ex-tent a reduction affects the classification quality. We consider applications in classical document categorization and in the upcoming area of expert finding and report corresponding results obtained from experiments with standard benchmark data.