Medical case-driven classification of microblogs: characteristics and annotation

  • Authors:
  • Mustafa Sofean;Kerstin Denecke;Avaré Stewart;Matthew Smith

  • Affiliations:
  • Leibniz University of Hannover, Hannover, Germany;L3S Research Center, Hannover, Germany;L3S Research Center, Hannover, Germany;Leibniz University of Hannover, Hannover, Germany

  • Venue:
  • Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study the use of microblogs as source of information for medical intelligence gathering. The huge amount of irrelevant data available in microblogs requires sophisticated filtering methods in order to identify only relevant postings. Microblogs are characteristically sparse and noisy. This requires additional considerations for selection of features for automatic classification for relevance with respect to medical intelligence gathering. In this paper, we will analyze which features are well suited. The objective of this work is three-fold: 1) Specifying annotation guidelines for creating a dataset for microblog classification, 2) Studying the characteristics of tweets for deciding on a well suited feature set, and 3) making use of that feature set in an automatic classification system for relevance filtering of microblogs. The quality of the classifier is assessed in experiments with various feature sets. The evaluation shows that despite the challenging characteristics of mircoblogs, good accuracy values of up to 89% can be achieved by the classifier. One main outcome of this work is a data set of annotated twitter data which can be used as a "gold standard" benchmark for further research in this domain.