Point-distribution algorithm for mining vector-item patterns

  • Authors:
  • Anne M. Denton;Jianfei Wu;Dietmar H. Dorr

  • Affiliations:
  • North Dakota State University, Fargo, ND;North Dakota State University, Fargo, ND;Research and Development Thomson Reuters, St. Paul, MN

  • Venue:
  • Proceedings of the ACM SIGKDD Workshop on Useful Patterns
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

An algorithm is presented for finding patterns between sets of continuous attributes and item sets. In contrast to most pattern mining approaches, the algorithm considers multiple continuous attributes as a single vector attribute. This approach results in a separate abstraction level and allows multiple vector attributes to be considered. We show that the pattern mining process can uncover relationships between the vector data and item sets. Filtering according to these patterns can be seen as feature selection at the level of the vector attributes as opposed to individual continuous attributes. In the evaluation, we show that the pattern mining algorithm can more effectively and efficiently achieve this filtering than a direct application of classification algorithms. Patterns are identified by relating item data to the distribution of objects within the vector space that is spanned by the sets of continuous attributes. The Kullback-Leibler divergence provides a quantitative measure that establishes whether the subset defined by an item set differs from the overall distribution of data points. The set-subset relationship of data points, which violates i.i.d assumptions, requires an adaptation of standard algorithms for computing the Kullback-Leibler divergence. The algorithm is evaluated on gene expression data and on a classification example problem that is constructed from time series data.