Automatic extraction of outbreak information from news

  • Authors:
  • Bing Liu;Yi Zhang

  • Affiliations:
  • University of Illinois at Chicago;University of Illinois at Chicago

  • Venue:
  • Automatic extraction of outbreak information from news
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

With the explosion of unstructured data on the Web, especially in the form of text, there has been a lot of interest to mine knowledge from these data for variety of purposes. In this thesis, we study a particular problem: how to extract disease outbreak information from news. By defining Emergent Disease Report, we focus on extracting disease name and outbreak location from the news report emergent disease outbreaks. First, we study the problem how to classify those sentences reporting disease outbreak, and propose to a new method by integrating semantic features with the bag-of-words scheme. Experimental result shows the integrated approach is better than each individual approach alone. Second, a novel method based on sequential rules is introduced to extract the outbreak locations from the outbreak reporting sentences, and the new method outperforms conditional random fields in our experimental data. Finally, we discuss how to do classification and extraction together using label sequential rules and how to geocode the extracted location named entities into geographical locations accurately. Evaluations on classification-extraction including geocoding are conducted, and the proposed method is shown to improve the overall performance.