Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems
C4.5: programs for machine learning
C4.5: programs for machine learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Discourse segmentation by human and automated means
Computational Linguistics
Cue phrase classification using machine learning
Journal of Artificial Intelligence Research
Information Processing and Management: an International Journal
An automatic extraction method of word tendency judgement for specific subjects
International Journal of Computer Applications in Technology
Multilingual story link detection based on event term weighting on times and multilingual spaces
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
An improvement approach for word tendency using decision tree
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part IV
Hi-index | 0.00 |
In every text, some words have frequency appearance and are considered as keywords because they have a strong relationship with the subjects of their texts, these words' frequencies change with time-series variation in a given period. However, in traditional text dealing methods and text search techniques, the importance of frequency change with time-series variation is not considered. Therefore, traditional methods could not correctly determine the index of word's popularity in a given period. In this paper, a new method is proposed to estimate automatically the stability classes (increasing, relatively constant, and decreasing) that indicate word's popularity with time-series variation based on the frequency change in past text data. At first, learning data were produced by defining five attributes to measure the frequency change of a word quantitatively. These five attributes were extracted automatically from electronic texts. These learning data were manual (human) classified into three stability classes. Then, these data were subjected to a decision tree to determine automatically stability classes of analysis data (test data). For learning data, we obtained the attribute values of 443 proper nouns that were extracted from 2216 articles of CNN newspapers (1997-1999) that discussed professional baseball. For testing data, 472 proper nouns that were extracted from 972 articles of CNN newspaper (1997-2000) then classified them automatically using decision tree. According to the comparison between the evaluation of the decision tree results and manually (human) results, F-measures of increasing, relatively constant and decreasing classes were 0.847, 0.851, and 0.768, respectively, and the effectiveness of this method is achieved.