Measuring similarity between collection of values

Authors:
Carina F. Dorneles;Carlos A. Heuser;Andrei E. N. Lima;Altigran Soares da Silva;Edleno Silva de Moura
Affiliations:
Universidade Federal do Rio Grande do Sul(UFRGS);Universidade Federal do Rio Grande do Sul(UFRGS);Universidade Federal do Rio Grande do Sul(UFRGS);Universidade Federal do Amazonas;Universidade Federal do Amazonas
Venue:
Proceedings of the 6th annual ACM international workshop on Web information and data management
Year:
2004

Citing 16
Cited 12

VAGUE: a user interface to relational databases that permits vague queries

ACM Transactions on Information Systems (TOIS)
Autonomous citation matching

Proceedings of the third annual conference on Autonomous Agents
Visual information retrieval

Visual information retrieval
Integrating keyword search into XML query processing

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Data integration using similarity joins and a word-based information representation language

ACM Transactions on Information Systems (TOIS)
Feature similarity

Principles of visual information retrieval
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Modern Information Retrieval

Modern Information Retrieval
Approximate String Joins in a Database (Almost) for Free

Proceedings of the 27th International Conference on Very Large Data Bases
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text joins in an RDBMS for web data integration

WWW '03 Proceedings of the 12th international conference on World Wide Web
Searching XML documents via XML fragments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
XML retrieval: what to retrieve?

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Querying structured text in an XML database

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Finding similar identities among objects from multiple web sources

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management

XML version detection

Proceedings of the 2007 ACM symposium on Document engineering
An approach to XML path matching

Proceedings of the 9th annual ACM international workshop on Web information and data management
A strategy for allowing meaningful and comparable scores in approximate matching

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
SimEval: a tool for evaluating the quality of similarity functions

ER '07 Tutorials, posters, panels and industrial contributions at the 26th international conference on Conceptual modeling - Volume 83
Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
XML data clustering: An overview

ACM Computing Surveys (CSUR)
An unsupervised heuristic-based approach for bibliographic metadata deduplication

Information Processing and Management: an International Journal
Estimating recall and precision for vague queries in databases

CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Information retrieval of sequential data in heterogeneous XML databases

AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Survey: An overview on XML similarity: Background, current trends and future directions

Computer Science Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain single values such as short character strings, date, etc, or complex, i.e., nested structures that contain other nodes, we propose two types of similarity metrics: MAVs, for atomic nodes and MCVs, for complex nodes. In the first case, we suggest the use of several application domain dependent metrics. In the second case, we define metrics for complex values that are structure dependent, and can be distinctly applied for it and collections of values. We also present experiments showing the effectiveness of our method.