Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study

Authors:
Kanoksri Sarinnapakorn;Miroslav Kubat
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 29
Cited 6

The Strength of Weak Learnability

Machine Learning
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Boosting a weak learning algorithm by majority

Information and Computation
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Data mining with decision trees and decision rules

Future Generation Computer Systems - Special double issue on data mining
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Maximizing Text-Mining Performance

IEEE Intelligent Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Multiclass Alternating Decision Trees

ECML '02 Proceedings of the 13th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A MFoM learning approach to robust multiclass multi-label text categorization

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Multi-Label Chinese Text Categorization System Based on Boosting Algorithm

CIT '04 Proceedings of the The Fourth International Conference on Computer and Information Technology
The Combination of Text Classifiers Using Reliability Indicators

Information Retrieval
An adaptive k-nearest neighbor text categorization strategy

ACM Transactions on Asian Language Information Processing (TALIP)
Multi-labelled classification using maximum entropy method

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
On Combining Classifier Mass Functions for Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Improving Classification Decisions by Multiple Knowledge

ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
A dempster-shafer theoretic framework for boosting based ensemble design

Pattern Analysis & Applications
A new technique for combining multiple classifiers using the dempster-shafer theory of evidence

Journal of Artificial Intelligence Research
Learning multi-label alternating decision trees from texts and data

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition

Undersampling approach for imbalanced training sets and induction from multi-label text-categorization domains

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Supervised learning with minimal effort

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
User-based collaborative filtering on cross domain by tag transfer learning

Proceedings of the 1st International Workshop on Cross Domain Knowledge Discovery in Web and Social Network Mining
Information-theoretic multi-view domain adaptation

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Multi-view discriminant transfer learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Irrelevant attributes and imbalanced classes in multi-label text-categorization domains

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization systems often induce document classifiers from pre-classified examples by the use of machine learning techniques. The circumstance that each example-document can belong to many different classes often leads to impractically high computational costs that sometimes grow exponentially in the number of features. Looking for ways to reduce these costs, we explored the possibility of running a ``baseline induction algorithm'' separately for subsets of features, obtaining a set of classifiers to be combined. For the specific case of classifiers that return not only class labels but also confidences in these labels, we investigate here a few alternative fusion techniques, including our own mechanism that was inspired by the Dempster-Shafer Theory. The paper describes the algorithm and, in our specific case study, compares its performance to that of more traditional mechanisms.