The TSIMMIS Approach to Mediation: Data Models and Languages
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Modern Information Retrieval
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Query-answering algorithms for information agents
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
BDBComp: building a digital library for the Brazilian computer science community
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Measuring similarity between collection of values
Proceedings of the 6th annual ACM international workshop on Web information and data management
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ENC '05 Proceedings of the Sixth Mexican International Conference on Computer Science
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Replica identification using genetic programming
Proceedings of the 2008 ACM symposium on Applied computing
Matching XML documents in highly dynamic applications
Proceedings of the eighth ACM symposium on Document engineering
Uma abordagem efetiva e eficiente para deduplicação de metadados bibliográficos de objetos digitais
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
An unsupervised heuristic-based approach for bibliographic metadata deduplication
Information Processing and Management: an International Journal
Duplicate detection through structure optimization
Proceedings of the 20th ACM international conference on Information and knowledge management
XML duplicate detection using sorted neighborhoods
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Improving XML instances comparison with preprocessing algorithms
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
When integrating data from multiple Web sources, objects can exist in different formats and structures, making it difficult to identify those that can be matched together. In this paper, we propose an identification approach to finding similar identities among objects from multiple Web sources. In this approach, object identification works like the relational join operation where a similarity function takes the place of the equality condition. This similarity function is based on information retrieval techniques. Our approach differs from others in the literature since it can be used to identify objects more complexly structured (e.g., XML documents) and not only objects with a flat structure such as relations. The effectiveness of our approach is demonstrated by experimental results with real Web data sources from different domains, that reach precision levels above 75%.