Tree pattern mining with tree automata constraints

  • Authors:
  • Sandra de Amo;Nyara A. Silva;Ronaldo P. Silva;Fabiola S. Pereira

  • Affiliations:
  • Faculdade de Computação - Universidade Federal de Uberlíndia, Campus Santa Mônica, Bloco B - Uberlíndia, MG, Brazil;Faculdade de Computação - Universidade Federal de Uberlíndia, Campus Santa Mônica, Bloco B - Uberlíndia, MG, Brazil;Faculdade de Computação - Universidade Federal de Uberlíndia, Campus Santa Mônica, Bloco B - Uberlíndia, MG, Brazil;Faculdade de Computação - Universidade Federal de Uberlíndia, Campus Santa Mônica, Bloco B - Uberlíndia, MG, Brazil

  • Venue:
  • Information Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most work on pattern mining focuses on simple data structures such as itemsets and sequences of itemsets. However, a lot of recent applications dealing with complex data like chemical compounds, protein structures, XML and Web log databases and social networks, require much more sophisticated data structures such as trees and graphs. In these contexts, interesting patterns involve not only frequent object values (labels) appearing in the graphs (or trees) but also frequent specific topologies found in these structures. Recently, several techniques for tree and graph mining have been proposed in the literature. In this paper, we focus on constraint-based tree pattern mining. We propose to use tree automata as a mechanism to specify user constraints over tree patterns. We present the algorithm CoBMiner which allows user constraints specified by a tree automata to be incorporated in the mining process. An extensive set of experiments executed over synthetic and real data (XML documents and Web usage logs) allows us to conclude that incorporating constraints during the mining process is far more effective than filtering the interesting patterns after the mining process.