Text Categorization Using Neural Networks Initialized with Decision Trees

Authors:
Nerijus Remeikis;Ignas Skučas;Vida Melninkaitė
Affiliations:
Faculty of Informatics, Vytautas Magnus University, Vileikos 8, 3035 Kaunas, Lithuania, e-mail: nerijus_remeikis@fc.vdu.lt, ignas_skucas@fc.vdu.lt, vida_melninkaite@fc.vdu.lt;Faculty of Informatics, Vytautas Magnus University, Vileikos 8, 3035 Kaunas, Lithuania, e-mail: nerijus_remeikis@fc.vdu.lt, ignas_skucas@fc.vdu.lt, vida_melninkaite@fc.vdu.lt;Faculty of Informatics, Vytautas Magnus University, Vileikos 8, 3035 Kaunas, Lithuania, e-mail: nerijus_remeikis@fc.vdu.lt, ignas_skucas@fc.vdu.lt, vida_melninkaite@fc.vdu.lt
Venue:
Informatica
Year:
2004

Citing 8
Cited 2

Parallel distributed processing: explorations in the microstructure of cognition, vol. 2: psychological and biological models

Parallel distributed processing: explorations in the microstructure of cognition, vol. 2: psychological and biological models
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
C4.5: programs for machine learning

C4.5: programs for machine learning
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Statistical and neural classifiers: an integrated approach to design

Statistical and neural classifiers: an integrated approach to design
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

An Efficient and Sensitive Decision Tree Approach to Mining Concept-Drifting Data Streams

Informatica
Chinese text classification based on neural network

ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text categorization - the assignment of natural language documents to one or more predefined categories based on their semantic content - is an important component in many information organization and management tasks. Performance of neural networks learning is known to be sensitive to the initial weights and architecture. This paper discusses the use multilayer neural network initialization with decision tree classifier for improving text categorization accuracy. Decision tree from root node until a final leave is used for initialization of each single unit. Growing decision trees with increasingly larger amounts of training data will result in larger decision tree sizes. As a result, the neural networks constructed from these decision trees are often larger and more complex than necessary. Appropriate choice of certainty factor is able to produce trees that are essentially constant in size in the face of increasingly larger training sets. Experimental results support the conclusion that error based pruning can be used to produce appropriately sized trees, which are directly mapped to optimal neural network architecture with good accuracy. The experimental evaluation demonstrates this approach provides better classification accuracy with Reuters-21578 corpus, one of the standard benchmarks for text categorization tasks. We present results comparing the accuracy of this approach with multilayer neural network initialized with traditional random method and decision tree classifiers.