A general framework to encode heterogeneous information sources for contextual pattern mining

  • Authors:
  • Weishan Dong;Wei Fan;Lei Shi;Changjin Zhou;Xifeng Yan

  • Affiliations:
  • IBM Research - China, Beijing, China;Huawei Noah's Ark Lab, Hong Kong, Hong Kong;Institute of Software, Chinese Academy of Sciences, Beijing, China;IBM Research -- China, Beijing, China;University of California at Santa Barbara, Santa Barbara, CA, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional pattern mining methods usually work on single data sources. However, in practice, there are often multiple and heterogeneous information sources. They collectively provide contextual information not available in any single source alone describing the same set of objects, and are useful for discovering hidden contextual patterns. One important challenge is to provide a general methodology to mine contextual patterns easily and efficiently. In this paper, we propose a general framework to encode contextual information from multiple sources into a coherent representation---Contextual Information Graph (CIG). The complexity of the encoding scheme is linear in both time and space. More importantly, CIG can be handled by any single-source pattern mining algorithms that accept taxonomies without any modification. We demonstrate by three applications of the contextual association rule, sequence and graph mining, that contextual patterns providing rich and insightful knowledge can be easily discovered by the proposed framework. It enables Contextual Pattern Mining (CPM) by reusing single-source methods, and is easy to deploy and use in real-world systems.