Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration

Authors:
Fangjiao Jiang;Weiyi Meng;Xiaofeng Meng
Affiliations:
School of Information, Renmin University of China, and College of Physics and Electronic Engineering, Xuzhou Normal University,;Computer Science Dept, SUNY at Binghamton,;School of Information, Renmin University of China,
Venue:
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Year:
2009

Citing 1
Cited 1

A random walk approach to sampling hidden databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Approximate content summary for database selection in deep web data integration

WAIM'10 Proceedings of the 2010 international conference on Web-age information management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Deep Web data integration, some Web database interfaces express exclusive predicates of the form Q e = P i (P i *** P 1 , P 2 ,...,P m ), which permits only one predicate to be selected at a time. Accurately and efficiently estimating the selectivity of each Q e is of critical importance to optimal query translation. In this paper, we mainly focus on the selectivity estimation on infinite-value attribute which is more difficult than that on key attribute and categorical attribute. Firstly, we compute the attribute correlation and retrieve approximate random attribute-level samples through submitting queries on the least correlative attribute to the actual Web database. Then we estimate Zipf equation based on the word rank of the sample and the actual selectivity of several words from the actual Web database. Finally, the selectivity of any word on the infinite-value attribute can be derived by the Zipf equation. An experimental evaluation of the proposed selectivity estimation method is provided and experimental results are highly accurate.