Multifaceted toponym recognition for streaming news

  • Authors:
  • Michael D. Lieberman;Hanan Samet

  • Affiliations:
  • University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA

  • Venue:
  • Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

News sources on the Web generate constant streams of information, describing many aspects of the events that shape our world. In particular, geography plays a key role in the news, and enabling geographic retrieval of news articles involves recognizing the textual references to geographic locations (called toponyms) present in the articles, which can be difficult due to ambiguity in natural language. Toponym recognition in news is often accomplished with algorithms designed and tested around small corpora of news articles, but these static collections do not reflect the streaming nature of online news, as evidenced by poor performance in tests. In contrast, a method for toponym recognition is presented that is tuned for streaming news by leveraging a wide variety of recognition components, both rule-based and statistical. An evaluation of this method shows that it outperforms two prominent toponym recognition systems when tested on large datasets of streaming news, indicating its suitability for this domain.