Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Determining Text Databases to Search in the Internet
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A Frequency-based Approach for Mining Coverage Statistics in Data Integration
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A random walk approach to sampling hidden databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Classification-aware hidden-web text database selection
ACM Transactions on Information Systems (TOIS)
Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Hi-index | 0.00 |
In Deep Web data integration, the metaquerier provides a unified interface for each domain, which can dispatch the user query to the most relevant Web databases. Traditional database selection algorithms are often based on content summaries. However, many web-accessible databases are uncooperative. The only way of accessing the contents of these databases is via querying. In this paper, we propose an approximate content summary approach for database selection. Furthermore, the real-life databases are not always static and, accordingly, the statistical content summary needs to be updated periodically to reflect database content changes. Therefore, we also propose a survival function approach to give appropriate schedule to regenerate approximate content summary. We conduct extensive experiments to illustrate the accuracy and efficiency of our techniques.