A neural network document classifier with linguistic feature selection

Authors:
Hahn-Ming Lee;Chih-Ming Chen;Cheng-Wei Hwang
Affiliations:
-;-;-
Venue:
IEA/AIE '00 Proceedings of the 13th international conference on Industrial and engineering applications of artificial intelligence and expert systems: Intelligent problem solving: methodologies and approaches
Year:
2000

Citing 5
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Fuzzy sets and fuzzy logic: theory and applications

Fuzzy sets and fuzzy logic: theory and applications
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, a neural network document classifier with linguistic feature selection and multi-category output is presented. It consists of a feature selection unit and a hierarchical neural network classification unit. In feature selection unit, we extract terms from some original documents by text processing, and then we analyze the conformity and uniformity of each term by entropy function which is characterized to measure the significance of term. Terms with high significance will be selected as input features for neural network document classifiers. In order to reduce the input dimension, we perform a mechanism to merge synonyms. According to the uniformity analysis, we obtain a term similarity matrix by fuzzy relation operation. By this method, we can construct a synonym thesaurus to reduce input dimension. In the hierarchical neural network classification unit, we adopt the well-known back-propagation learning model to build some proper hierarchical classification units. In our experiments, a product description database from an electronic commercial company is employed. The experimental results show that this classifier achieves sufficient accuracy to help human classification. It can save much manpower and working time for classifying a large database.