Sequential pattern mining in multi-relational datasets

  • Authors:
  • Carlos Abreu Ferreira;João Gama;Vítor Santos Costa

  • Affiliations:
  • LIAAD, INESC LA and CRACS, INESC LA, University of Porto and ISEP, Institute of Engineering of Porto;LIAAD, INESC LA;CRACS, INESC LA, University of Porto

  • Venue:
  • CAEPIA'09 Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a framework designed to mine sequential temporal patterns from multi-relational databases. In order to exploit logic-relational information without using aggregation methodologies, we convert the multi-relational dataset into what we name a multisequence database. Each example in a multi-relational target table is coded into a sequence that combines intra-table and inter-table relational temporal information. This allows us to find heterogeneous temporal patterns through standard sequence miners. Our framework is grounded in the excellent results achieved by previous propositionalization strategies. We follow a pipelined approach, where we first use a sequence miner to find frequent sequences in the multi-sequence database. Next, we select the most interesting findings to augment the representational space of the examples. The most interesting sequence patterns are discriminative and class correlated. In the final step we build a classifier model by taking an enlarged target table as input to a classifier algorithm. We evaluate the performance of this work through a motivating application, the hepatitis multi-relational dataset. We prove the effectiveness of our methodology by addressing two problems of the hepatitis dataset.