A Method for Estimating the Precision of Placename Matching

Authors:
Martin Doerr;Manos Papagelis
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2007

Citing 9
Cited 4

A crash course in metaphysics for the database designer

Journal of Database Management
Modeling Web sources for information integration

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Duplicate record elimination in large data files

ACM Transactions on Database Systems (TODS)
Automating the approximate record-matching process

Information Sciences—Informatics and Computer Science: An International Journal
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Duplicate Removal in Information System Dissemination

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Infoxtract: A customizable intermediate level information extraction engine

Natural Language Engineering

Extracting geographic features from the Internet to automatically build detailed regional gazetteers

International Journal of Geographical Information Science
Individual behavior and social influence in online social systems

Proceedings of the 22nd ACM conference on Hypertext and hypermedia
An information fusion approach to integrate image annotation and text mining methods for geographic knowledge discovery

Expert Systems with Applications: An International Journal
OtO matching system: a multi-strategy approach to instance matching

CAiSE'12 Proceedings of the 24th international conference on Advanced Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information in digital libraries and information systems frequently refers to locations or objects in geographic space. Digital gazetteers are commonly employed to match the referred placenames with actual locations in information integration and data cleaning procedures. This process may fail due to missing information in the gazetteer, multiple matches, or false positive matches. We have analyzed the cases of success and reasons for failure of the mapping process to a gazetteer. Based on these, we present a statistical model that permits estimating 1) the completeness of a gazetteer with respect to the specific target area and application, 2) the expected precision and recall of one-to-one mappings of source placenames to the gazetteer, 3) the semantic inconsistency that remains in one-to-one mappings, and 4) the degree to which the precision and recall are improved under knowledge of the identity of higher levels in a hierarchy of places. The presented model is based on statistical analysis of the mapping process of a large set of placenames itself and does not require any other background data. The statistical model assumes that a gazetteer is populated by a stochastic process. The paper discusses how future work could take deviations from this assumption into account. The method has been applied to a real case.