Enhancing Text Classification Using Synopses Extraction

Authors:
Liping Ma;John Shepherd;Yanchun Zhang
Affiliations:
-;-;-
Venue:
WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
Year:
2003

Citing 0
Cited 4

Information extraction using two-phase pattern discovery

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical document categorization with k-NN and concept-based thesauri

Information Processing and Management: an International Journal
Latent semantic analysis for text categorization using neural network

Knowledge-Based Systems
Hierarchical document categorization with k-NN and concept-based thesauri

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel approach to document classification that uses decision-tree machine learning based on a succinct vector of important terms in each document. The succinct vector itself is generated by a machine-learning approach which builds parsers that can identify significant features in a document by partitioning it into regions based on low-level document characteristics. The fact that the feature vector is succinct overcomes the problem of very large term vectors, which have hindered the application of conventional machine learning to document classification. The fact that the parser can be trained to extract only important terms from documents means that small training sets can be used to achieve the same classification accuracy as with conventional approaches.