Depth First Rule Generation for Text Categorization

Authors:
Jiyuan An;Yi-Ping Phoebe Chen
Affiliations:
School of Information Technology, Faculty of Science and Technology, Deakin University, Melbourne, VIC 3125, Australia;School of Information Technology, Faculty of Science and Technology, Deakin University, Melbourne, VIC 3125, Australia and Australian Research Council Centre in Bioinformatics, E-mail: {jiyuan, ph ...
Venue:
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Year:
2006

Citing 9
Cited 0

Machine Learning

Machine Learning
The CN2 Induction Algorithm

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Phrase-based Document Similarity Based on an Index Graph Model

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Ontology-Based Web Mining Model: Representations of User Profiles

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Interpretations of Association Rules by Granular Computing

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Concept Learning of Text Documents

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification methods are usually used to categorize text documents, such as, Rocchio method, Naïve bayes based method, and SVM based text classification method. These methods learn labeled text documents and then construct classifiers. The generated classifiers can predict which category is located for a new coming text document. The keywords in the document are often used to form rules to categorize text documents, for example “kw = computer” can be a rule for the IT documents category. However, the number of keywords is very large. To select keywords from the large number of keywords is a challenging work. Recently, a rule generation method based on enumeration of all possible keywords combinations has been proposed [2]. In this method, there remains a crucial problem: how to prune irrelevant combinations at the early stages of the rule generation procedure. In this paper, we propose a method than can effectively prune irrelative keywords at an early stage.