A Frequency-based Approach for Mining Coverage Statistics in Data Integration

Authors:
Zaiging Nie;Subbarao Kambhampati
Affiliations:
-;-
Venue:
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Year:
2004

Citing 16
Cited 13

Query caching and optimization in distributed mediator systems

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Building regression cost models for multidatabase systems

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Joint optimization of cost and coverage of query plans in data integration

Proceedings of the tenth international conference on Information and knowledge management
Mining coverage statistics for websource selection in a mediator

Proceedings of the eleventh international conference on Information and knowledge management
Quality-driven Integration of Heterogenous Information Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
A Scalable Algorithm for Answering Queries Using Views

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using Probabilistic Information in Data Integration

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimizing Recursive Information-Gathering Plans

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning response time for WebSources using query feedback and application in query optimization

The VLDB Journal — The International Journal on Very Large Data Bases
Efficiently Ordering Query Plans for Data Integration

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
BibFinder/StatMiner: effectively mining and using coverage and overlap statistics in data integration

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Answering imprecise database queries: a novel approach

WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
Integration of biological sources: current systems and challenges ahead

ACM SIGMOD Record
Effectively Mining and Using Coverage and Overlap Statistics for Data Integration

IEEE Transactions on Knowledge and Data Engineering
Improving text collection selection with coverage and overlap statistics

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
AReNA: adaptive distributed catalog infrastructure based on relevance networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement

Proceedings of the 19th international conference on World wide web
Approximate content summary for database selection in deep web data integration

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Factal: integrating deep web based on trust and relevance

Proceedings of the 20th international conference companion on World wide web
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement

Proceedings of the 20th international conference on World wide web
Large-scale copy detection

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Query planning in the presence of overlapping sources

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Assessing relevance and trust of the deep web sources and results based on inter-source agreement

ACM Transactions on the Web (TWEB)
Agreement based source selection for the multi-topic deep web integration

Proceedings of the 17th International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Query optimization in data integration requires source coverageand overlap statistics.Gathering and storing the requiredstatistics presents many challenges, not the least of which is controllingthe amount of statistics learned.In this paper we introduceStatMiner, a novel statistics mining approach which automaticallygenerates attribute value hierarchies, efficiently discoversfrequently accesses query classes based on the learned attributevalue hierarchies, and learns statistics only with respect to theseclasses.We describe the details of our method, and present experimentalresults demonstrating the efficiency and effectiveness of ourapproach.Our experiments are done in the context of BibFinder,a publicly fielded bibliography mediator.