A Hypergraph-based Method for Discovering Semantically Associated Itemsets

  • Authors:
  • Haishan Liu;Paea Le Pendu;Ruoming Jin;Dejing Dou

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address an interesting data mining problem of finding semantically associated item sets, i.e., items connected via indirect links. We propose a novel method for discovering semantically associated item sets based on a hyper graph representation of the database. We describe two similarity measures to compute the strength of associations between items. Specifically, we introduce the average commute time similarity, $\mathbf{s_{CT}}$, based on the random walk model on hyper graph, and the inner-product similarity, $\mathbf{s_{L+}}$, based on the Moore-Penrose pseudoinverse of the hyper graph Laplacian matrix. Given semantically associated 2-itemsets generated by these measures, we design a hyper graph expansion method with two search strategies, namely, the clique and connected component search, to generate $k$-item sets ($k2$). We show the proposed method is indeed capable of capturing semantically associated item sets through experiments performed on three datasets ranging from low to high dimensionality. The semantically associated item sets discovered in our experiment is promising to provide valuable insights on interrelationship between medical concepts and other domain specific concepts.