An intelligent web-page classifier with fair feature-subset selection

Authors:
Chih-Ming Chen;Hahn-Ming Lee;Chia-Chen Tan
Affiliations:
Graduate Institute of Library, Information and Archival Studies, National Chengchi University, No. 64, Sec. 2, Zhinan Rd., Taipei 116, Taiwan, ROC;Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan, ROC;Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan, ROC
Venue:
Engineering Applications of Artificial Intelligence
Year:
2006

Citing 14
Cited 1

Correlation of term usage and term indexing frequencies

Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Original Contribution: On the training of radial basis function classifiers

Neural Networks
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity metric learning for a variable-kernel classifier

Neural Computation
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Searching the Internet

IEEE Internet Computing
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)

Text categorization using weight adjusted k-nearest neighbor classification (information retrieval)
A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection

Applied Intelligence
An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Nonlinear transformation of term frequencies for term weighting in text categorization

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosion of on-line information has given rise to many manually constructed topic hierarchies (such as Yahoo!!). But with the current growth rate in the amount of information, manual classification in topic hierarchies results in an immense information bottleneck. Therefore, developing an automatic classifier is an urgent need. However, classifiers suffer from enormous dimensionality, since the dimensionality is determined by the number of distinct keywords in a document corpus. More seriously, most classifiers are either working slowly or they are constructed subjectively without any learning ability. In this paper, we address these problems with a fair feature-subset selection (FFSS) algorithm and an adaptive fuzzy learning network (AFLN) for classification. The FFSS algorithm is used to reduce the enormous dimensionality. It not only gives fair treatment to each category but also has ability to identify useful features, including both positive and negative features. On the other hand, the AFLN provides extremely fast learning ability to model the uncertain behavior for classification so as to correct the fuzzy matrix automatically. Experimental results show that both FFSS algorithm and the AFLN lead to a significant improvement in document classification, compared to alternative approaches.