A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Discovering the representative of a search engine
Proceedings of the eleventh international conference on Information and knowledge management
Proceedings of the 27th International Conference on Very Large Data Bases
Evaluating Top-k Queries over Web-Accessible Databases
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Capturing collection size for distributed non-cooperative retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Efficient search engine measurements
Proceedings of the 16th international conference on World Wide Web
A random walk approach to sampling hidden databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Adaptive-sampling algorithms for answering aggregation queries on Web sites
Data & Knowledge Engineering
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Mining search engine query logs via suggestion sampling
Proceedings of the VLDB Endowment
Leveraging COUNT Information in Sampling Hidden Databases
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Turbo-charging hidden database samplers with overflowing queries and skew reduction
Proceedings of the 13th International Conference on Extending Database Technology
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Effective and efficient sampling methods for deep web aggregation queries
Proceedings of the 14th International Conference on Extending Database Technology
Optimal algorithms for crawling a hidden database in the web
Proceedings of the VLDB Endowment
Breaking the top-k barrier of hidden web databases?
ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)
Hi-index | 0.00 |
Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.