Ontology-Driven Approximate Duplicate Elimination of Postal Addresses

  • Authors:
  • Matteo Cristani;Alessio Gugole

  • Affiliations:
  • University of Verona, 37134, Verona;University of Verona, 37134, Verona

  • Venue:
  • IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In several common real-life cases of usage of postal address databases an important problem that is often necessary to solve is the one of duplicate elimination. This may occur because a database of addresses is merged to another one, for instance during a joint-venture or a fusion between two companies, so that two or more than two addresses are the same.Though a trivial approach based upon identification can be used in principle, this attempt would indeed fail in any concrete case, in particular for postal addresses, because the same address can be written in several different ways so that an approximate approach can be adopted successfully, under the condition that the duplicate elimination is correctly performed. We identify an ontology-driven approach for postal addresses which solves the problem in an approximate fashion. The algorithm is based upon a modification of the Levenshtein distance, obtained by introducing the notion of admissible abbreviation, and has a threefold outcome: eliminate duplicates, do not eliminate duplicates, undecided.