A case-study on learning from large-scale intracranial EEG data using multi-core machines and clusters

Authors:
Haimonti Dutta;Huascar Fiorletta;Manoj Pooleery;Hatim Diab;Stanley German;David Waltz;Catherine A. Schevon
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;Columbia University, New York, NY;The Columbia University Medical School (CUMC), Columbia University, New York, NY
Venue:
Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Year:
2011

Citing 12
Cited 0

Mean phase coherence as a measure for phase synchronization and its application to the EEG of epilepsy patients

Physica D
Mining Very Large Databases with Parallel Processing

Mining Very Large Databases with Parallel Processing
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A decade of progress in indexing and mining large time series databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
iSAX: indexing and mining terabyte sized time series

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Time series shapelets: a new primitive for data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Managing massive time series streams with multi-scale compressed trickles

Proceedings of the VLDB Endowment
Patient-Specific Seizure Detection from Intra-cranial EEG Using High Dimensional Clustering

ICMLA '10 Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications
Recent advances in mining time series data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Epilepsy is a chronic neurological disorder characterized by recurrent, unprovoked seizures that manifest in a variety of ways, including emotional or behavioral disturbances, convulsive movements, and loss of awareness. The problem of prediction of epileptic seizures is hard and most algorithms do not perform better than a random predictor [20]. An important reason why studies so far have been less than successful is that electroencephalogram (EEG) is not recorded at the granularity of the seizure generation process. Our collaborators at the Columbia University Medical School (CUMC) have been involved in a clinical trial which entails implanting a Micro-Electrode Array directly into the neocortex of epilepsy patients undergoing surgery to remove the portion of the brain from where seizures originate. The 96-contact grid allows researchers to record at 30 KHz/channel which is a very high resolution data collection procedure compared to known state-of-the-art techniques and yields both local field and action potential data (.5 TB per patient per day). This large volume of data poses challenges for knowledge discovery and mining. In this paper, we describe the steps required for processing the EEG signal and extraction of features; we present a parallel design for scaling up processing on multi-core machines and an in-house cluster. Initial benchmarking results indicate that approximately 6-cores of a machine (processing speed of 2.7 GHz, 32 GB RAM, moderate workload) is sufficient to process a 5 minute chunk of data from 96 channels in approximately 12 mins. Encouraged by these results, we discuss design of other machine learning algorithms for learning from the data.