Fast Induction of Multiple Decision Trees in Text Categorization from Large Scale, Imbalanced, and Multi-label Data

Authors:
Peerapon Vateekul;Miroslav Kubat
Affiliations:
-;-
Venue:
ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Year:
2009

Citing 0
Cited 2

Projected-prototype based classifier for text categorization

Knowledge-Based Systems
Irrelevant attributes and imbalanced classes in multi-label text-categorization domains

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.