Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
On a relation between graph edit distance and maximum common subgraph
Pattern Recognition Letters
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Automated ontology construction for unstructured text documents
Data & Knowledge Engineering
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Ontology-based intelligent decision support agent for CMMI project monitoring and control
International Journal of Approximate Reasoning
A genetic fuzzy agent using ontology model for meeting scheduling system
Information Sciences: an International Journal
A fuzzy ontology and its application to news summarization
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
In several common real-life cases of usage of postal address databases an important problem that is often necessary to solve is the one of duplicate elimination. This may occur because a database of addresses is merged to another one, for instance during a joint-venture or a fusion between two companies, so that two or more than two addresses are the same.Though a trivial approach based upon identification can be used in principle, this attempt would indeed fail in any concrete case, in particular for postal addresses, because the same address can be written in several different ways so that an approximate approach can be adopted successfully, under the condition that the duplicate elimination is correctly performed. We identify an ontology-driven approach for postal addresses which solves the problem in an approximate fashion. The algorithm is based upon a modification of the Levenshtein distance, obtained by introducing the notion of admissible abbreviation, and has a threefold outcome: eliminate duplicates, do not eliminate duplicates, undecided.