Constraint relaxations for discovering unknown sequential patterns

  • Authors:
  • Cláudia Antunes;Arlindo L. Oliveira

  • Affiliations:
  • Instituto Superior Técnico / INESC-ID, Department of Information Systems and Computer Science, Lisboa, Portugal;Instituto Superior Técnico / INESC-ID, Department of Information Systems and Computer Science, Lisboa, Portugal

  • Venue:
  • KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main drawbacks of sequential pattern mining have been its lack of focus on user expectations and the high number of discovered patterns. However, the solution commonly accepted – the use of constraints – approximates the mining process to a verification of what are the frequent patterns among the specified ones, instead of the discovery of unknown and unexpected patterns. In this paper, we propose a new methodology to mine sequential patterns, keeping the focus on user expectations, without compromising the discovery of unknown patterns. Our methodology is based on the use of constraint relaxations, and it consists on using them to filter accepted patterns during the mining process. We propose a hierarchy of relaxations, applied to constraints expressed as context-free languages, classifying the existing relaxations (legal, valid and naïve, previously proposed), and proposing several new classes of relaxations. The new classes range from the approx and non-accepted, to the composition of different types of relaxations, like the approx-legal or the non-prefix-valid relaxations. Finally, we present a case study that shows the results achieved with the application of this methodology to the analysis of the curricular sequences of computer science students.