Disambiguating toponyms in news

Authors:
Eric Garbin;Inderjeet Mani
Affiliations:
Georgetown University, Washington, DC;Georgetown University, Washington, DC
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 8
Cited 15

Introduction to the special issue on evaluating word sense disambiguation systems

Natural Language Engineering
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
On assigning place names to geography related web pages

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Semi-supervised learning of geographical gazetteers from the internet

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
InfoXtract location normalization: a hybrid approach to geographic references in information extraction

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Bootstrapping toponym classifiers

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning trees and rules with set-valued features

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Geographic co-occurrence as a tool for gir.

Proceedings of the 4th ACM workshop on Geographical information retrieval
Using co-occurrence models for placename disambiguation

International Journal of Geographical Information Science
A conceptual density-based approach for the disambiguation of toponyms

International Journal of Geographical Information Science
Map-based vs. knowledge-based toponym disambiguation

Proceedings of the 2nd international workshop on Geographic information retrieval
Classifying Documents According to Locational Relevance

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Geotagging: using proximity, sibling, and prominence clues to understand comma groups

Proceedings of the 6th Workshop on Geographic Information Retrieval
Grounding toponyms in an Italian local news corpus

Proceedings of the 6th Workshop on Geographic Information Retrieval
Toponym resolution in social media

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Multifaceted toponym recognition for streaming news

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Quantifying the impact of concept recognition on biomedical information retrieval

Information Processing and Management: an International Journal
Identification of live news events using Twitter

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks
IdentityRank: Named entity disambiguation in the news domain

Expert Systems with Applications: An International Journal
Adaptive context features for toponym resolution in streaming news

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Improving vertical geo/geo disambiguation by increasing geographical feature weights of places

Proceedings of the 2012 ACM Research in Applied Computation Symposium
Supporting rapid processing and interactive map-based exploration of streaming news

Proceedings of the 20th International Conference on Advances in Geographic Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ambiguity, with respect to a gazetteer, for toponyms in news. We found that 67.82% of the toponyms found in a corpus that were ambiguous in a gazetteer lacked a local discriminator in the text. Given the scarcity of human-annotated data, our method used unsupervised machine learning to develop disambiguation rules. Toponyms were automatically tagged with information about them found in a gazetteer. A toponym that was ambiguous in the gazetteer was automatically disambiguated based on preference heuristics. This automatically tagged data was used to train a machine learner, which disambiguated toponyms in a human-annotated news corpus at 78.5% accuracy.