Top-N minimization approach for indicative correlation change mining

  • Authors:
  • Aixiang Li;Makoto Haraguchi;Yoshiaki Okubo

  • Affiliations:
  • Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan;Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan

  • Venue:
  • MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a family of transaction databases, various data mining methods for extracting patterns distinguishing one database from another have been extensively studied. This paper particularly focuses on a problem of finding patterns that are more uncorrelated in one database, called a base, and begin to be correlated to some extent in another database, called a target. The detected patterns are not highly correlated at the target. In spite of less correlatedness at the target, the detected patterns are regarded as indicative based on a fact that they are uncorrelated in the base. We design our search procedure for those patterns by applying optimization strategy under some constraints. More precisely, the objective is to minimize the correlation of patterns at the base under the constraint using upper bound of correlations at the target and the lower bound for the correlation changes over two databases. As there exist many potential solutions, we apply top N control that attains the bottom N correlation values at the base for all the patterns satisfying the constraint. As we measure the degree of correlation by k-way mutual information, that is monotonically increasing with respect to item addition, we can design a dynamic pruning method for disregarding useless items under the top N control. This contributes for much reducing the computational cost, in whole search process, needed to calculate correlation values over several items as random variables. As a result, we can present a complete search procedure producing only top N solution patterns from a set of all patterns satisfying the constraint, and show its effectiveness and efficiency through experiments.