INDUCTION FROM MULTI-LABEL EXAMPLES IN INFORMATION RETRIEVAL SYSTEMS: A CASE STUDY

Authors:
Kanoksri Sarinnapakorn;Miroslav Kubat
Affiliations:
Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, USA;Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, USA
Venue:
Applied Artificial Intelligence
Year:
2008

Citing 27
Cited 1

The Strength of Weak Learnability

Machine Learning
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Boosting a weak learning algorithm by majority

Information and Computation
Bagging predictors

Machine Learning
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Data mining with decision trees and decision rules

Future Generation Computer Systems - Special double issue on data mining
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Information Retrieval

Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A Multi-Label Chinese Text Categorization System Based on Boosting Algorithm

CIT '04 Proceedings of the The Fourth International Conference on Computer and Information Technology
The Combination of Text Classifiers Using Reliability Indicators

Information Retrieval
An adaptive k-nearest neighbor text categorization strategy

ACM Transactions on Asian Language Information Processing (TALIP)
Multi-labelled classification using maximum entropy method

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Classification Decisions by Multiple Knowledge

ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization

IEEE Transactions on Knowledge and Data Engineering
A reduction technique for nearest-neighbor classification: Small groups of examples

Intelligent Data Analysis
A new technique for combining multiple classifiers using the dempster-shafer theory of evidence

Journal of Artificial Intelligence Research
Learning multi-label alternating decision trees from texts and data

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition

A niching algorithm to learn discriminant functions with multi-label patterns

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a “baseline induction algorithm” on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a “master classifier” combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.