Efficiently Evaluating Order Preserving Similarity Queries over Historical Market-Basket Data

  • Authors:
  • Reza Sherkat;Davood Rafiei

  • Affiliations:
  • University of Alberta;University of Alberta

  • Venue:
  • ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new domain-independent framework for formulating and efficiently evaluating similarity queries over historical data, where given a history as a sequence of timestamped observations and the pair-wise similarity of observations, we want to find similar histories. For instance, given a database of customer transactions and a time period, we can find customers with similar purchasing behaviors over this period. Our work is different from the work on retrieving similar time series; it addresses the general problem in which a history cannot be modeled as a time series, hence the relevant conventional approaches are not applicable. We derive a similarity measure for histories, based on an aggregation of the similarities between the observations of the two histories, and propose efficient algorithms for finding an optimal alignment between two histories. Given the non-metric nature of our measure, we develop some upper bounds and an algorithm that makes use of those bounds to prune histories that are guaranteed not to be in the answer set. Our experimental results on real and synthetic data confirm the effectiveness and efficiency of our approach. For instance, when the minimum length of a match is provided, our algorithm achieves up to an order of magnitude speed-up over alternative methods.