Geographical classification of documents using evidence from Wikipedia

  • Authors:
  • Rafael Odon de Alencar;Clodoveu Augusto Davis, Jr.;Marcos André Gonçalves

  • Affiliations:
  • Federal University of Minas Gerais, Belo Horizonte -- MG - Brazil;Federal University of Minas Gerais, Belo Horizonte -- MG - Brazil;Federal University of Minas Gerais, Belo Horizonte -- MG - Brazil

  • Venue:
  • Proceedings of the 6th Workshop on Geographic Information Retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Obtaining or approximating a geographic location for search results often motivates users to include place names and other geography-related terms in their queries. Previous work shows that queries that include geography-related terms correspond to a significant share of the users' demand. Therefore, it is important to recognize the association of documents to places in order to adequately respond to such queries. This paper describes strategies for text classification into geography-related categories, using evidence extracted from Wikipedia. We use terms that correspond to entry titles and the connections between entries in Wikipedia's graph to establish a semantic network from which classification features are generated. Results of experiments using a news data-set, classified over Brazilian states, show that such terms constitute valid evidence for the geographical classification of documents, and demonstrate the potential of this technique for text classification.