Information Extraction Approaches to Unconventional Data Sources for "Injury Surveillance System": the Case of Newspapers Clippings

  • Authors:
  • Paola Berchialla;Cecilia Scarinzi;Silvia Snidero;Yousif Rahim;Dario Gregori

  • Affiliations:
  • Department of Public Health and Microbiology, University of Torino, Torino, Italy;Department of Statistics and Applied Mathematics D. de Castro, University of Torino, Torino, Italy;S&A S.r.l., Cuneo, Italy;International Society for Violence and Injury Prevention, Stockholm, Norway;Department of Public Health and Microbiology, University of Torino, Torino, Italy and Department of Environmental Medicine and Public Health, Padova, Italy 35121

  • Venue:
  • Journal of Medical Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.