GO-SPADE: mining sequential patterns over datasets with consecutive repetitions

  • Authors:
  • Marion Leleu;Christophe Rigotti;Jean-François Boulicaut;Guillaume Euvrard

  • Affiliations:
  • Laboratoire d'Ingénierie des Systèmes d'Information, Bâtiment Blaise Pascal, INSA Lyon, Villeurbanne Cedex, France and Informatique CDC, Bagneux, France;Laboratoire d'Ingénierie des Systèmes d'Information, Bâtiment Blaise Pascal, INSA Lyon, Villeurbanne Cedex, France;Laboratoire d'Ingénierie des Systèmes d'Information, Bâtiment Blaise Pascal, INSA Lyon, Villeurbanne Cedex, France;Informatique CDC, Bagneux, France

  • Venue:
  • MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Databases of sequences can contain consecutive repetitions of items. This is the case in particular when some items represent discretized quantitative values. We show that on such databases, a typical algorithm like the SPADE algorithm tends to loose its efficiency. SPADE is based on the used of lists containing the localization of the occurrences of a pattern in the sequences and these lists are not appropriated in the case of data with repetitions. We introduce the concept of generalized occurrences and the corresponding primitive operators to manipulate them. We present an algorithm called GO-SPADE that extends SPADE to incorporate generalized occurrences. Finally we present experiments showing that GO-SPADE can handle sequences containing consecutive repetitions at nearly no extra cost.