Estimating sentence types in computer related new product bulletins using a decision tree

Authors:
Tokunaga Hidekazu;Atlam El-Sayed;Fuketa Masao;Morita Kazuhiro;Tsuda Kazuhiko;Jun-ichi Aoe
Affiliations:
Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan and Department of Statistics and Computer science, Faculty of Science, Tanta Universit ...;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan
Venue:
Information Sciences—Informatics and Computer Science: An International Journal
Year:
2004

Citing 7
Cited 6

C4.5: programs for machine learning

C4.5: programs for machine learning
A genetic algorithm method for optimizing fuzzy decision trees

Information Sciences: an International Journal
On automatic generation of multimedia presentations

Information Sciences: an International Journal
Efficient search for fuzzy models using genetic algorithm

Information Sciences—Informatics and Computer Science: An International Journal - Special issue on modeling with soft-computing
A document classification method by using field association words

Information Sciences—Informatics and Computer Science: An International Journal
Automatic test data generation for path testing using GAs

Information Sciences: an International Journal
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning

A discretization algorithm based on Class-Attribute Contingency Coefficient

Information Sciences: an International Journal
Estimation of FAQ knowledge by classifying questions and answers

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
Estimation of FAQ knowledge bases by using semantic expressions for questions and answers

International Journal of Computer Applications in Technology
Intelligent QA Systems Using Semantic Expressions

KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II
Integration of data mining technologies to analyze customer value for the automotive maintenance industry

Expert Systems with Applications: An International Journal
Estimation of FAQ knowledge bases by introducing measurements

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous articles concerning computer related to new product news are present on the Internet. Information extraction and automatic text summarization are necessary for the effective use of these articles. The present paper reveals that the estimation of four sentence types (HATSUBAI [sales], SHIYO [specifications], KOZO [structure], KINO [function]) is an effective as preprocessing for information extraction and automatic text summarization. Moreover, this paper introduces a technique for estimating these sentence types using a decision tree. This decision tree does not involve proper nouns or technical terms but rather verbal nouns and numeratives at the end of sentences, as well as other general words, as attributes. Since sub-setting attribute values is important for creating the decision tree, the sub-setting of the representative decision tree algorithm C4.5 was revised. The gain ratio criterion was changed, and the hill climbing method was replaced with a genetic algorithm. A decision tree was created from 1539 sentences for learning data, and 299 sentences were estimated by the decision tree as test data. The number of incorrectly estimated sentences was 81 when C4.5 used without revision but these number decreased to 70 after revising the sub-setting.