Hierarchical Classification of Documents with Error Control

Authors:
Chun Hung Cheng;Jian Tang;Ada Wai-Chee Fu;Irwin King
Affiliations:
-;-;-;-
Venue:
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Year:
2001

Citing 17
Cited 1

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining methods for knowledge discovery

Data mining methods for knowledge discovery
Bayesian Networks for Data Mining

Data Mining and Knowledge Discovery
Induction of Decision Trees

Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Building Hierarchical Classifiers Using Class Proximity

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Interval Classifier for Database Mining Applications

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Building classifiers using Bayesian networks

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
An efficient algorithm for finding optimal gain-ratio multiple-split tests on hierarchical attributes in decision tree learning

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Helping physicians to organize guidelines within conceptual hierarchies

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized by the large number of attributes involved in the objects (documents). The traditional method of building a single classifier to do all the classification work would incur a high overhead. Hierarchical classification is a more efficient method -- instead of a single classifier, we use a set of classifiers distributed over a class taxonomy, one for each internal node. However, once a misclassification occurs at a high level class, it may result in a class that is far apart from the correct one. An existing approach to coping with this problem requires terms also to be arranged hierarchically. In this paper, instead of overhauling the classifier itself, we propose mechanisms to detect misclassification and take appropriate actions. We then discuss an alternative that masks the misclassification based on a well known software fault tolerance technique. Our experiments show our algorithms represent a good trade-off between speed and accuracy in most applications.