The TSIMMIS Approach to Mediation: Data Models and Languages
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Data & Knowledge Engineering
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Database Schema Matching Using Machine Learning with Feature Selection
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Multiplex: A Formal Model for Multidatabases and Its Implementation
NGIT '99 Proceedings of the 4th International Workshop on Next Generation Information Technologies and Systems
Autoplex: Automated Discovery of Content for Virtual Databases
CooplS '01 Proceedings of the 9th International Conference on Cooperative Information Systems
A Schema Analysis and Reconciliation Tool Environment for Heterogeneous Databases
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Managing uncertainty in databases and scaling it up to concurrent transactions
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Hi-index | 0.00 |
Recently, the problem of data integration has been newly addressed by methods based on machine learning and discovery. Such methods are intended to automate, at least in part, the laborious process of information integration, by which existing data sources are incorporated in a virtual database. Essentially, these methods scan new data sources, attempting to discover possible mappings to the virtual database. Like all discovery processes, this process is intrinsically probabilistic; that is, each discovery is associated with a specific value that denotes assurance of its appropriateness. Consequently, the rows in a discovered virtual table have mixed assurance levels, with some rows being more credible than others. We argue that rows in discovered virtual databases should be ranked, and we describe a ranking method, called TupleRank, for calculating such a ranking order. Roughly speaking, TupleRank calibrates the probabilities calculated during a discovery process with historical information about the performance of the system. The work is done in the framework of the Autoplex system for discovering content for virtual databases, and initial experimentation is reported and discussed.