Early Detection of Numerical Typing Errors Using Data Mining Techniques

Authors:
Shouyi Wang;Cheng-Jhe Lin;Changxu Wu;Wanpracha Art Chaovalitwongse
Affiliations:
Department of Industrial and Systems Engineering, Rutgers University, Piscataway, NJ, USA;Department of Industrial and System Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA;Department of Industrial and System Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA;Department of Industrial and Systems Engineering, Rutgers University, Piscataway, NJ, USA
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Year:
2011

Citing 0
Cited 2

Towards safer number entry in interactive medical systems

Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the applications of data mining techniques in early detection of numerical typing errors by human operators through a quantitative analysis of multichannel electroencephalogram (EEG) recordings. Three feature extraction techniques were developed to capture temporal, morphological, and time–frequency (wavelet) characteristics of EEG data. Two most commonly used data mining techniques, namely, linear discriminant analysis (LDA) and support vector machine (SVM), were employed to classify EEG samples associated with correct and erroneous keystrokes. The leave-one-error-pattern-out and leave-one-subject-out cross-validation methods were designed to evaluate the in- and cross-subject classification performances, respectively. For the in-subject classification, the best testing performance had a sensitivity of 62.20% and a specificity of 51.68%, which were achieved by SVM using morphological features. For the cross-subject classification, the best testing performance was achieved by LDA using temporal features, based on which it had a sensitivity of 68.72% and a specificity of 49.45%. In addition, the receiver operating characteristic (ROC) analysis revealed that the averaged values of the area under ROC curves of LDA and SVM for the in- and cross-subject classifications were both greater than 0.60 using the EEG 300 ms prior to the keystrokes. The classification results of this study indicated that the EEG patterns of erroneous keystrokes might be different from those of the correct ones. As a result, it may be possible to predict erroneous keystrokes prior to error occurrence. The classification problem addressed in this study is extremely challenging due to the very limited number of erroneous keystrokes made by each subject and the complex spatiotemporal characteristics of the EEG data. However, the outcome of this study is quite encouraging, and it is promising to develop a prospective early detection system for erroneous keystrokes based on brain-wave signals.