Ontology-Driven Approximate Duplicate Elimination of Postal Addresses

Authors:
Matteo Cristani;Alessio Gugole
Affiliations:
University of Verona, 37134, Verona;University of Verona, 37134, Verona
Venue:
IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
Year:
2008

Citing 10
Cited 0

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
On a relation between graph edit distance and maximum common subgraph

Pattern Recognition Letters
Character N-Gram Tokenization for European Language Text Retrieval

Information Retrieval
Learning stochastic edit distance: Application in handwritten character recognition

Pattern Recognition
Automated ontology construction for unstructured text documents

Data & Knowledge Engineering
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Ontology-based intelligent decision support agent for CMMI project monitoring and control

International Journal of Approximate Reasoning
Ontology-based computational intelligent multi-agent and its application to CMMI assessment

Applied Intelligence
A genetic fuzzy agent using ontology model for meeting scheduling system

Information Sciences: an International Journal
A fuzzy ontology and its application to news summarization

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In several common real-life cases of usage of postal address databases an important problem that is often necessary to solve is the one of duplicate elimination. This may occur because a database of addresses is merged to another one, for instance during a joint-venture or a fusion between two companies, so that two or more than two addresses are the same.Though a trivial approach based upon identification can be used in principle, this attempt would indeed fail in any concrete case, in particular for postal addresses, because the same address can be written in several different ways so that an approximate approach can be adopted successfully, under the condition that the duplicate elimination is correctly performed. We identify an ontology-driven approach for postal addresses which solves the problem in an approximate fashion. The algorithm is based upon a modification of the Levenshtein distance, obtained by introducing the notion of admissible abbreviation, and has a threefold outcome: eliminate duplicates, do not eliminate duplicates, undecided.