Healthcare trajectory mining by combining multidimensional component and itemsets

  • Authors:
  • Elias Egho;Chedy Raïssi;Dino Ienco;Nicolas Jay;Amedeo Napoli;Pascal Poncelet;Catherine Quantin;Maguelonne Teisseire

  • Affiliations:
  • Orpailleur Team, LORIA, France;INRIA, France;Irstea, UMR TETIS, Montpellier, France,LIRMM, Univ. Montpellier 2, Montpellier, France;Orpailleur Team, LORIA, France;Orpailleur Team, LORIA, France;Irstea, UMR TETIS, Montpellier, France,LIRMM, Univ. Montpellier 2, Montpellier, France;Department of Biostatistics and Medical Information, Dijon, France;Irstea, UMR TETIS, Montpellier, France,LIRMM, Univ. Montpellier 2, Montpellier, France

  • Venue:
  • NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequential pattern mining is aimed at extracting correlations among temporal data. Many different methods were proposed to either enumerate sequences of set valued data (i.e., itemsets) or sequences containing multidimensional items. However, in real-world scenarios, data sequences are described as events of both multidimensional items and set valued information. These rich heterogeneous descriptions cannot be exploited by traditional approaches. For example, in healthcare domain, hospitalizations are defined as sequences of multi-dimensional attributes (e.g. Hospital or Diagnosis) associated with two sets, set of medical procedures (e.g. $ \lbrace $ Radiography, Appendectomy $\rbrace$) and set of medical drugs (e.g. $\lbrace $ Aspirin, Paracetamol $\rbrace$) . In this paper we propose a new approach called MMISP (Mining Multidimensional Itemset Sequential Patterns) to extract patterns from a complex sequences including both dimensional items and itemsets. The novelties of the proposal lies in: (i) the way in which the data can be efficiently compressed; (ii) the ability to reuse and adopt sequential pattern mining algorithms and (iii) the extraction of new kind of patterns. We introduce as a case-study, experimented on real data aggregated from a regional healthcare system and we point out the usefulness of the extracted patterns. Additional experiments on synthetic data highlights the efficiency and scalability of our approach.