Concept Learning of Text Documents

Authors:
Jiyuan An;Yi-Ping Phoebe Chen
Affiliations:
Deakin University, Australia;Deakin University, Australia
Venue:
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Year:
2004

Citing 8
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Using Genetic Algorithms for Concept Learning

Machine Learning - Special issue on genetic algorithms
A vector space model for automatic indexing

Communications of the ACM
Machine Learning

Machine Learning
Induction of Decision Trees

Machine Learning
Phrase-based Document Similarity Based on an Index Graph Model

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Ontology-Based Web Mining Model: Representations of User Profiles

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Interpretations of Association Rules by Granular Computing

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

Depth First Rule Generation for Text Categorization

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006

Quantified Score

Hi-index	0.00

Visualization

Abstract

Concept learning of text documents can be viewed as the problem of acquiring the definition of a general category of documents. To definite the category of a text document, the Conjunctive of keywords is usually be used. These keywords should be fewer and comprehensible. A naïve method is enumerating all combinations of keywords to extract suitable ones. However, because of the enormous number of keyword combinations, it is impossible to extract the most relevant keywords to describe the categories of documents by enumerating all possible combinations of keywords. Many heuristic methods are proposed, such as GA-base, immune based algorithm. In this work, we introduce pruning power technique and propose a robust enumeration-based concept learning algorithm. Experimental results show that the rules produce by our approach has more comprehensible and simplicity than by other methods.