Automatic extraction of semantic content from medical discharge records

Authors:
György Szarvas;Szilárd Iván;András Bánhalmi;János Csirik
Affiliations:
University of Szeged, Department of Informatics, Szeged, Hungary;University of Szeged, Department of Informatics, Szeged, Hungary;University of Szeged, Department of Informatics, Szeged, Hungary;Hungarian Academy of Sciences, Research Group on Artificial Intelligence, Szeged, Hungary
Venue:
ICOSSE'06 Proceedings of the 5th WSEAS international conference on System science and simulation in engineering
Year:
2006

Citing 3
Cited 0

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Role of local context in automatic deidentification of ungrammatical, fragmented text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
On sample size and classification accuracy: a performance comparison

ISBMDA'05 Proceedings of the 6th International conference on Biological and Medical Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semi-structured medical texts like discharge summaries are rich sources of information that can exploit the research results of physicians by performing statistical analysis of similar cases. In this paper we introduce a system based on Machine Learning algorithms that successfully classifies discharge records according to the smoking status of the patient (we distinguish between current smoker, past smoker, smoker /where a decision between the former two classes cannot be made/, non-smoker and unknown /where the document contains no data on smoking status/ classes). Such systems are useful for examining the connection between certain social habits and diseases like cancer or asthma. We trained and tested our model on the shared task organized by the I2B2 (Informatics for Integrating Biology and the Bedside) research center [1], and despite the low amount of training data available, our system shows promising results in identifying the smoking habits of patients based on their medical discharge summaries.