Rank discovery from web databases

Authors:
Saravanan Thirumuruganathan;Nan Zhang;Gautam Das
Affiliations:
University of Texas at Arlington;George Washington University;University of Texas at Arlington
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 19
Cited 0

A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Discovering the representative of a search engine

Proceedings of the eleventh international conference on Information and knowledge management
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Capturing collection size for distributed non-cooperative retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Sampling, information extraction and summarisation of hidden web databases

Data & Knowledge Engineering - Special issue: WIDM 2004
Efficient search engine measurements

Proceedings of the 16th international conference on World Wide Web
A random walk approach to sampling hidden databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Adaptive-sampling algorithms for answering aggregation queries on Web sites

Data & Knowledge Engineering
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Mining search engine query logs via suggestion sampling

Proceedings of the VLDB Endowment
Leveraging COUNT Information in Sampling Hidden Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Turbo-charging hidden database samplers with overflowing queries and skew reduction

Proceedings of the 13th International Conference on Extending Database Technology
Unbiased estimation of size and other aggregates over hidden web databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Effective and efficient sampling methods for deep web aggregation queries

Proceedings of the 14th International Conference on Extending Database Technology
Optimal algorithms for crawling a hidden database in the web

Proceedings of the VLDB Endowment
Breaking the top-k barrier of hidden web databases?

ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many web databases are only accessible through a proprietary search interface which allows users to form a query by entering the desired values for a few attributes. After receiving a query, the system returns the top-k matching tuples according to a pre-determined ranking function. Since the rank of a tuple largely determines the attention it receives from website users, ranking information for any tuple - not just the top-ranked ones - is often of significant interest to third parties such as sellers, customers, market researchers and investors. In this paper, we define a novel problem of rank discovery over hidden web databases. We introduce a taxonomy of ranking functions, and show that different types of ranking functions require fundamentally different approaches for rank discovery. Our technical contributions include principled and efficient randomized algorithms for estimating the rank of a given tuple, as well as negative results which demonstrate the inefficiency of any deterministic algorithm. We show extensive experimental results over real-world databases, including an online experiment at Amazon.com, which illustrates the effectiveness of our proposed techniques.