Discretization of time series dataset using relative frequency and K-nearest neighbor approach

  • Authors:
  • Azuraliza Abu Bakar;Almahdi Mohammed Ahmed;Abdul Razak Hamdan

  • Affiliations:
  • Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Selangor Darul Ehsan, Malaysia;Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Selangor Darul Ehsan, Malaysia;Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Selangor Darul Ehsan, Malaysia

  • Venue:
  • ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we propose an improved approach of time series data discretization using the Relative Frequency and K- nearest Neighbor functions called the RFknn method. The main idea of the method is to improve the process of determining the sufficient number of intervals for discretization of time series data. The proposed approach improved the time series data representation by integrating it with the Piecewise Aggregate Approximation (PAA) and the Symbolic Aggregate Approximation (SAX) representation. The intervals are represented as a symbol and can ensure efficient mining process where better knowledge model can be obtained without major loss of knowledge. The basic idea is not to minimize or maximize the number of intervals of the temporal patterns over their class labels. The performance of RFknn is evaluated using 22 temporal datasets and compared to the original time series discretization SAX method with similar representation. We show that RFknn can improve representation preciseness without losing symbolic nature of the original SAX representation. The experimental results showed that RFknn gives better term of representation with lower and comparable error rates.