Healthcare trajectory mining by combining multidimensional component and itemsets

Authors:
Elias Egho;Chedy Raïssi;Dino Ienco;Nicolas Jay;Amedeo Napoli;Pascal Poncelet;Catherine Quantin;Maguelonne Teisseire
Affiliations:
Orpailleur Team, LORIA, France;INRIA, France;Irstea, UMR TETIS, Montpellier, France,LIRMM, Univ. Montpellier 2, Montpellier, France;Orpailleur Team, LORIA, France;Orpailleur Team, LORIA, France;Irstea, UMR TETIS, Montpellier, France,LIRMM, Univ. Montpellier 2, Montpellier, France;Department of Biostatistics and Medical Information, Dijon, France;Irstea, UMR TETIS, Montpellier, France,LIRMM, Univ. Montpellier 2, Montpellier, France
Venue:
NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns
Year:
2012

Citing 14
Cited 0

SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Multi-dimensional sequential pattern mining

Proceedings of the tenth international conference on Information and knowledge management
Mining Multiple-Level Association Rules in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Mining Sequential Patterns from Multidimensional Sequence Data

IEEE Transactions on Knowledge and Data Engineering
PAID: Mining Sequential Patterns by Passed Item Deduction in Large Databases

IDEAS '06 Proceedings of the 10th International Database Engineering and Applications Symposium
ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System

FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
Mining multidimensional and multilevel sequential patterns

ACM Transactions on Knowledge Discovery from Data (TKDD)
FAST sequence mining based on sparse id-lists

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential pattern mining is aimed at extracting correlations among temporal data. Many different methods were proposed to either enumerate sequences of set valued data (i.e., itemsets) or sequences containing multidimensional items. However, in real-world scenarios, data sequences are described as events of both multidimensional items and set valued information. These rich heterogeneous descriptions cannot be exploited by traditional approaches. For example, in healthcare domain, hospitalizations are defined as sequences of multi-dimensional attributes (e.g. Hospital or Diagnosis) associated with two sets, set of medical procedures (e.g. $ \lbrace $ Radiography, Appendectomy $\rbrace$) and set of medical drugs (e.g. $\lbrace $ Aspirin, Paracetamol $\rbrace$) . In this paper we propose a new approach called MMISP (Mining Multidimensional Itemset Sequential Patterns) to extract patterns from a complex sequences including both dimensional items and itemsets. The novelties of the proposal lies in: (i) the way in which the data can be efficiently compressed; (ii) the ability to reuse and adopt sequential pattern mining algorithms and (iii) the extraction of new kind of patterns. We introduce as a case-study, experimented on real data aggregated from a regional healthcare system and we point out the usefulness of the extracted patterns. Additional experiments on synthetic data highlights the efficiency and scalability of our approach.