An exploratory study of news article clustering for web-based bio-surveillance

  • Authors:
  • Manabu Torii;Burt-Ujin Bayarsaikhan;Hongfang Liu;Thang Nguyen;Kevin Jones;Noele P.> Nelson;David M. Hartley

  • Affiliations:
  • Georgetown University Medical Center, Washington, DC, USA;Georgetown University Medical Center, Washington, DC, USA;Georgetown University Medical Center, Washington, DC, USA;Georgetown University Medical Center, Washington, DC, USA;Georgetown University Medical Center, Washington, DC, USA;Georgetown University Medical Center, Washington, DC, USA;Georgetown University Medical Center, Washington, DC, USA

  • Venue:
  • Proceedings of the 1st ACM International Health Informatics Symposium
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Online news articles provide rich and timely information for disease outbreak surveillance. Meanwhile, it is not trivial to search articles relevant to disease outbreaks among the large volume of online publications. In this study, we examined the use of text clustering techniques to organize online articles. To take into account surveillance analysts' expertise in clustering articles, we considered selection of informative word features in a supervised manner. Our experiments suggest that the supervised selection of features can significantly reduce the features size without affecting the utility of resulting clusters. In addition, we observed that the clustering algorithm could yield consistent results when a small number of selected features were used.