An evaluation method of words tendency depending on time-series variation and its improvements

Authors:
El-Sayed Atlam;Makoto Okada;Masami Shishibori;Jun-ichi Aoe
Affiliations:
Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan
Venue:
Information Processing and Management: an International Journal
Year:
2002

Citing 5
Cited 4

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems

Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning

C4.5: programs for machine learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Discourse segmentation by human and automated means

Computational Linguistics
Cue phrase classification using machine learning

Journal of Artificial Intelligence Research

Korean-Japanese story link detection based on distributional and contrastive properties of event terms

Information Processing and Management: an International Journal
An automatic extraction method of word tendency judgement for specific subjects

International Journal of Computer Applications in Technology
Multilingual story link detection based on event term weighting on times and multilingual spaces

ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
An improvement approach for word tendency using decision tree

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

In every text, some words have frequency appearance and are considered as keywords because they have a strong relationship with the subjects of their texts, these words' frequencies change with time-series variation in a given period. However, in traditional text dealing methods and text search techniques, the importance of frequency change with time-series variation is not considered. Therefore, traditional methods could not correctly determine the index of word's popularity in a given period. In this paper, a new method is proposed to estimate automatically the stability classes (increasing, relatively constant, and decreasing) that indicate word's popularity with time-series variation based on the frequency change in past text data. At first, learning data were produced by defining five attributes to measure the frequency change of a word quantitatively. These five attributes were extracted automatically from electronic texts. These learning data were manual (human) classified into three stability classes. Then, these data were subjected to a decision tree to determine automatically stability classes of analysis data (test data). For learning data, we obtained the attribute values of 443 proper nouns that were extracted from 2216 articles of CNN newspapers (1997-1999) that discussed professional baseball. For testing data, 472 proper nouns that were extracted from 972 articles of CNN newspaper (1997-2000) then classified them automatically using decision tree. According to the comparison between the evaluation of the decision tree results and manually (human) results, F-measures of increasing, relatively constant and decreasing classes were 0.847, 0.851, and 0.768, respectively, and the effectiveness of this method is achieved.