A Theory of Attributed Equivalence in Databases with Application to Schema Integration
IEEE Transactions on Software Engineering
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data & Knowledge Engineering
Semantic integration of heterogeneous information sources
Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
Inference rules for functional and inclusion dependencies
PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
Inclusion dependencies and their interaction with functional dependencies
PODS '82 Proceedings of the 1st ACM SIGACT-SIGMOD symposium on Principles of database systems
Efficient Algorithms for Mining Inclusion Dependencies
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Discovery of Functional and Approximate Dependencies Using Partitions
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Algebraic Properties of Bag Data Types
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Discovery of Constraints from Data for Information System Reverse Engineering
ASWEC '97 Proceedings of the Australian Software Engineering Conference
On schema matching with opaque column names and data values
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Zigzag: a new algorithm for mining large inclusion dependencies in databases
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Efficient similarity-based operations for data integration
Data & Knowledge Engineering
Efficient discovery of functional dependencies with degrees of satisfaction: Research Articles
International Journal of Intelligent Systems - Intelligent and Soft Computing Techniques for Information Processing
Database dependency discovery: a machine learning approach
AI Communications
Corpus-based knowledge representation
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hi-index | 0.00 |
A key problem in the integration of information sources is the identification of related attributes or objects across independent sources. Inferring such meta-information from source data (rather than a-priori available meta-data, such as attribute names) is sometimes possible. For example, existing algorithms attempt to integrate information sources by finding patterns such as Inclusion Dependencies (INDs) across them. However, INDs are based on exact set inclusion and are thus very strict patterns that rarely hold across independent real-world databases.We propose two error-tolerant measures, termed Similarity Score and Distribution Score, that help identify related attributes across two independent databases, based on similarities in their data. Those measures specifically address the problem of identifying semantic relationships between textual attributes of databases that have few or no equal values.We also present implementations of those measures and some experimental results.