Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

Authors:
Jian Huang;Omid Madani;C. Lee Giles
Affiliations:
Pennsylvania State University, University Park, PA, USA;SRI International, Menlo Park, CA, USA;Pennsylvania State University, University Park, PA, USA
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 22
Cited 2

Original Contribution: Stacked generalization

Neural Networks
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Using and combining predictors that specialize

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Scaling multi-class support vector machines using inter-class confusion

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A family of additive online algorithms for category ranking

The Journal of Machine Learning Research
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Data mining in metric space: an empirical analysis of supervised learning performance criteria

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs

The Journal of Machine Learning Research
Support vector machines classification with a very large-scale taxonomy

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Single-pass online learning: performance, voting schemes and online feature selection

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
On updates that constrain the features' connections during learning

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
TreeBoost.MH: a boosting algorithm for multi-label hierarchical text categorization

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Learning When Concepts Abound

The Journal of Machine Learning Research
A comparative study of classifier combination applied to NLP tasks

Information Fusion

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a multi-stage ensemble framework, Error-Driven Generalist+Expert or Edge, for improved classification on large-scale text categorization problems. Edge first trains a generalist, capable of classifying under all classes, to deliver a reasonably accurate initial category ranking given an instance. Edge then computes a confusion graph for the generalist and allocates the learning resources to train experts on relatively small groups of classes that tend to be systematically confused with one another by the generalist. The experts' votes, when invoked on a given instance, yield a reranking of the classes, thereby correcting the errors of the generalist. Our evaluations showcase the improved classification and ranking performance on several large-scale text categorization datasets. Edge is in particular efficient when the underlying learners are efficient. Our study of confusion graphs is also of independent interest.