An evaluation method of words tendency depending on time-series variation and its improvements

  • Authors:
  • El-Sayed Atlam;Makoto Okada;Masami Shishibori;Jun-ichi Aoe

  • Affiliations:
  • Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan;Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In every text, some words have frequency appearance and are considered as keywords because they have a strong relationship with the subjects of their texts, these words' frequencies change with time-series variation in a given period. However, in traditional text dealing methods and text search techniques, the importance of frequency change with time-series variation is not considered. Therefore, traditional methods could not correctly determine the index of word's popularity in a given period. In this paper, a new method is proposed to estimate automatically the stability classes (increasing, relatively constant, and decreasing) that indicate word's popularity with time-series variation based on the frequency change in past text data. At first, learning data were produced by defining five attributes to measure the frequency change of a word quantitatively. These five attributes were extracted automatically from electronic texts. These learning data were manual (human) classified into three stability classes. Then, these data were subjected to a decision tree to determine automatically stability classes of analysis data (test data). For learning data, we obtained the attribute values of 443 proper nouns that were extracted from 2216 articles of CNN newspapers (1997-1999) that discussed professional baseball. For testing data, 472 proper nouns that were extracted from 972 articles of CNN newspaper (1997-2000) then classified them automatically using decision tree. According to the comparison between the evaluation of the decision tree results and manually (human) results, F-measures of increasing, relatively constant and decreasing classes were 0.847, 0.851, and 0.768, respectively, and the effectiveness of this method is achieved.