An efficient time series data mining technique

  • Authors:
  • Hatim A. Aboalsamh;Alaaeldin M. Hafez;Ghazy M. R. Assassa

  • Affiliations:
  • Department of Computer Sciences, College of Computer and Information Sciences, King Saud University;Department of Information Systems, College of Computer and Information Sciences, King Saud University;Department of Computer Sciences, College of Computer and Information Sciences, King Saud University

  • Venue:
  • ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In our study, we emphasis on the use of data mining techniques on time series, where mining techniques and tools are used in an attempt to recognize, anticipate and learn the time series behavior with different directly related or looked unrelated factors. Targeted data are sequences of observations collected over intervals of time. Each sequence describes a phenomenon or a factor. Such factors could have either a direct or indirect impact on the time series under study. Examples of factors with direct impact include the yearly budgets and expenditures, taxations, local stocks prices, unemployment rates, inflation rates, fallen angels, and rising odds for upgrades. Indirect factors could include any phenomena in the local or global environments, such as, global stocks prices, education expenditures, weather conditions, employment strategies, and medical services. Analysis on data includes discovering trends (or patterns) and association between sequences in order to generate non-trivial knowledge. In this paper, we propose a data mining technique to predict the dependency between factors that affect performance. The proposed technique consists of three phases: (a) for each data sequence that represents a chosen phenomenon, generate its trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future factor sequences.