A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
Combinatorial optimization: algorithms and complexity
Combinatorial optimization: algorithms and complexity
Federated database systems for managing distributed, heterogeneous, and autonomous databases
ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
Information retrieval
Using semantic values to facilitate interoperability among heterogeneous information systems
ACM Transactions on Database Systems (TODS)
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Distance-based indexing for high-dimensional metric spaces
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Infomaster: an information integration system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
STRUDEL: a Web site management system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The WHIPS prototype for data warehouse creation and maintenance
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Using schematically heterogeneous structures
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Exploring the similarity space
ACM SIGIR Forum
Semantic similarities between objects in multiple databases
Management of heterogeneous and autonomous database systems
XML-based information mediation with MIX
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Semantic integration of semistructured and structured data sources
ACM SIGMOD Record
The OASIS multidatabase prototype
ACM SIGMOD Record
Data & Knowledge Engineering
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
Information Retrieval
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases
IEEE Transactions on Knowledge and Data Engineering
IRO-DB: Making Relational and Object-Oriented Database Systems Interoperable
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Object Exchange Across Heterogeneous Information Sources
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Similarity Indexing with the SS-tree
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Metadata Approach to Resolving Semantic Conflicts
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ObjectGlobe: Ubiquitous query processing on the Internet
The VLDB Journal — The International Journal on Very Large Data Bases
Entity Matching in Heterogeneous Databases: A Distance Based Decision Model
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
A new hierarchical clustering model for speeding up the reconciliation of xml-based, semistructured data in mediation systems
Making quality count in biological data sources
Proceedings of the 2nd international workshop on Information quality in information systems
Data & Knowledge Engineering
PORSCHE: Performance ORiented SCHEma mediation
Information Systems
A schema matching-based approach to XML schema clustering
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Hi-index | 0.00 |
We describe a family of heuristics-based clustering strategies to support the merging of XML data from multiple sources. As part of this research, we have developed a comprehensive classification for schematic and semantic conflicts that can occur when reconciling related XML data from multiple sources. Given the fact that element clustering is compute-intensive, especially when comparing large numbers of data elements that exhibit great representational diversity, performance is a critical, yet so far neglected aspect of the merging process. We have developed five heuristics for clustering data in the multi-dimensional metric space. Equivalence of data elements within the individual clusters is determined using several distance functions that calculate the semantic distances among the elements.The research described in this article is conducted within the context of the Integration Wizard (IWIZ) project at the University of Florida. IWIZ enables users to access and retrieve information from multiple XML-based sources through a consistent, integrated view. The results of our qualitative analysis of the clustering heuristics have validated the feasibility of our approach as well as its superior performance when compared to other similarity search techniques.