Using and learning semantics in frequent subgraph mining

  • Authors:
  • Bettina Berendt

  • Affiliations:
  • Institute of Information Systems, Humboldt University Berlin, Berlin, Germany

  • Venue:
  • WebKDD'05 Proceedings of the 7th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The search for frequent subgraphs is becoming increasingly important in many application areas including Web mining and bioinformatics. Any use of graph structures in mining, however, should also take into account that it is essential to integrate background knowledge into the analysis, and that patterns must be studied at different levels of abstraction. To capture these needs, we propose to use taxonomies in mining and to extend frequency / support measures by the notion of context-induced interestingness. The AP-IP mining problem is to find all frequent abstract patterns and the individual patterns that constitute them and are therefore interesting in this context (even though they may be infrequent). The paper presents the fAP-IP algorithm that uses a taxonomy to search for the abstract and individual patterns, and that supports graph clustering to discover further structure in the individual patterns. Semantics are used as well as learned in this process. fAP-IP is implemented as an extension of Gaston (Nijssen & Kok, 2004), and it is complemented by the AP-IP visualization tool that allows the user to navigate through detail-and-context views of taxonomy context, pattern context, and transaction context. A case study of a real-life Web site shows the advantages of the proposed solutions. ACM categories and subject descriptors and keywords: H.2.8 [Database Management]: Database Applications—data mining; H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia —navigation, user issues; graph mining; Web mining; background knowledge and semantics in mining.