Data mining from a patient safety database: the lessons learned

Authors:
James Bentham;David J. Hand
Affiliations:
Department of Medical and Molecular Genetics, King's College, London, UK;Department of Mathematics, Imperial College, London, UK and Institute for Mathematical Sciences, Imperial College, London, UK
Venue:
Data Mining and Knowledge Discovery
Year:
2012

Citing 14
Cited 1

The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A vector space model for automatic indexing

Communications of the ACM
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Data Mining: Machine Learning, Statistics, and Databases

SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
Introduction to Information Retrieval

Introduction to Information Retrieval
Adapting a WSJ-trained parser to grammatically noisy text

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Semantic hashing

International Journal of Approximate Reasoning
Evaluating a statistical CCG parser on Wikipedia

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Detecting groups of anomalously similar objects in large data sets

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The issue of patient safety is an extremely important one; each year in the UK, hundreds of thousands of people suffer due to some sort of incident that occurs whilst they are in National Health Service care. The National Patient Safety Agency (NPSA) works to try to reduce the scale of the problem. One of its major projects is to collect a very large dataset, the Reporting and Learning System (RLS), which describes several million of these incidents. The RLS is used as the basis for research by the NPSA. However, the NPSA has identified a gap in their work between high-level quantitative analysis and detailed, manual analysis of small samples. This paper describes the lessons learned from a knowledge discovery process that attempted to fill this gap. The RLS contains a free text description of each incident. A high dimensional model of the text is calculated, using the vector space model with term weighting applied. Dimensionality reduction techniques are used to produce the final models of the text. These models are examined using an anomaly detection tool to find groups of incidents that should be coherent in meaning, and that might be of interest to the NPSA. A three stage process is developed for assessing the results. The first stage uses a quantitative measure based on the use of planted groups of known interest, the second stage involves manual filtering by a non-expert, and the third stage is assessment by clinical experts.