Refined experts: improving classification in large taxonomies

Authors:
Paul N. Bennett;Nam Nguyen
Affiliations:
Microsoft Research, Redmond, WA, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 24
Cited 23

Hierarchical mixtures of experts and the EM algorithm

Neural Computation
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical neural networks for text categorization (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Optimizing search by showing results in context

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Information Retrieval

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Large margin hierarchical classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management
The Combination of Text Classifiers Using Reliability Indicators

Information Retrieval
Improving web search results using affinity graph

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Support vector machines classification with a very large-scale taxonomy

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Hierarchical classification: combining Bayes with SVM

ICML '06 Proceedings of the 23rd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Incremental Algorithms for Hierarchical Classification

The Journal of Machine Learning Research
Mixtures of hierarchical topics with Pachinko allocation

Proceedings of the 24th international conference on Machine learning
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Deep classification in large-scale text hierarchies

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Bayesian hierarchical mixtures of experts

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Improved lower bounds for learning intersections of halfspaces

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Optimal rare query suggestion with implicit user feedback

Proceedings of the 19th international conference on World wide web
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Web page classification on child suitability

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Result enrichment in commerce search using browse trails

Proceedings of the fourth ACM international conference on Web search and data mining
A combined topical/non-topical approach to identifying web sites for children

Proceedings of the fourth ACM international conference on Web search and data mining
A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery
Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Post-ranking query suggestion by diversifying search results

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Clustering web search results with maximum spanning trees

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Hierarchy evolution for improved classification

Proceedings of the 20th ACM international conference on Information and knowledge management
Query suggestion by constructing term-transition graphs

Proceedings of the fifth ACM international conference on Web search and data mining
An evaluation of classification models for question topic categorization

Journal of the American Society for Information Science and Technology
Hierarchical classification of web documents by stratified discriminant analysis

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
On empirical tradeoffs in large scale hierarchical classification

Proceedings of the 21st ACM international conference on Information and knowledge management
Learning compact class codes for fast inference in large multi class classification

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Adaptive classifier selection in large-scale hierarchical classification

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Variable-constraint classification and quantification of radiology reports under the ACR Index

Expert Systems with Applications: An International Journal
Learning to rank from structures in hierarchical text classification

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Recursive regularization for large-scale classification with hierarchical and graphical dependencies

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Error recovered hierarchical classification

Proceedings of the 21st ACM international conference on Multimedia
Semantic contextual advertising based on the open directory project

ACM Transactions on the Web (TWEB)
Utilizing global and path information with language modelling for hierarchical text classification

Journal of Information Science
Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

While large-scale taxonomies--especially for web pages--have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three causes: increasing sparsity of training data at deeper nodes in the taxonomy, error propagation where a mistake made high in the hierarchy cannot be recovered, and increasingly complex decision surfaces in higher nodes in the hierarchy. While prior research has focused on the first problem, we introduce methods that target the latter two problems--first by biasing the training distribution to reduce error propagation and second by propagating up "first-guess" expert information in a bottom-up manner before making a refined top down choice. Finally, we present an empirical study demonstrating that the suggested changes lead to 10--30% improvements in F1 scores versus an accepted competitive baseline, hierarchical SVMs.