An Eigendecomposition Approach to Weighted Graph Matching Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Elements of information theory
Elements of information theory
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Semint: a system prototype for semantic integration in heterogeneous databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The Handbook of Mathematics and Computational Science
The Handbook of Mathematics and Computational Science
Using Schema Matching to Simplify Heterogeneous Data Translation
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Database Schema Matching Using Machine Learning with Feature Selection
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Comparison of Schema Matching Evaluations
Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
On schema matching with opaque column names and data values
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
Schema Matching Using Neural Network
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Establishing value mappings using statistical models and user feedback
Proceedings of the 14th ACM international conference on Information and knowledge management
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Schema Matching Using Interattribute Dependencies
IEEE Transactions on Knowledge and Data Engineering
Uninterpreted Schema Matching with Embedded Value Mapping under Opaque Column Names and Data Values
IEEE Transactions on Knowledge and Data Engineering
HAMSTER: using search clicklogs for schema and taxonomy matching
Proceedings of the VLDB Endowment
Restricting the overlap of Top-N sets in schema matching
Proceedings of the 1st Workshop on New Trends in Similarity Search
Identifying value mappings for data integration: an unsupervised approach
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Managing uncertainty in schema matching with top-k schema mappings
Journal on Data Semantics VI
Hi-index | 0.00 |
Schema matching and value mapping across two information sources, such as databases, are critical information aggregation tasks. Before data can be integrated from multiple tables, the columns and values within the tables must be matched. The complexities of both these problems grow quickly with the number of attributes to be matched and due to multiple semantics of data values. Traditional research has mostly tackled schema matching and value mapping independently, and for categorical (discrete-valued) attributes. We propose novel methods that leverage value mappings to enhance schema matching in the presence of opaque column names for schemas consisting of both continuous and discrete-valued attributes. An additional source of complexity is that a discrete-valued attribute in one schema could in fact be a quantized, encoded version of a continuous-valued attribute in the other schema. In our approach, which can tackle both “onto” and bijective schema matching, the fitness objective for matching a pair of attributes from two schemas exploits the statistical distribution over values within the two attributes. Suitable fitness objectives are based on Euclidean-distance and the data log-likelihood, both of which are applied in our experimental study. A heuristic local descent optimization strategy that uses two-opt switching to optimize attribute matches, while simultaneously embedding value mappings, is applied for our matching methods. Our experiments show that the proposed techniques matched mixed continuous and discrete-valued attribute schemas with high accuracy and, thus, should be a useful addition to a framework of (semi) automated tools for data alignment.