Random generation of combinatorial structures from a uniform
Theoretical Computer Science
VLDB '89 Proceedings of the 15th international conference on Very large data bases
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing and aggregating rankings with ties
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Accurately interpreting clickthrough data as implicit feedback
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Random sampling from a search engine's index
Proceedings of the 15th international conference on World Wide Web
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Estimating corpus size via queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient search engine measurements
Proceedings of the 16th international conference on World Wide Web
A random walk approach to sampling hidden databases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
The impact of caching on search engines
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Monte Carlo Strategies in Scientific Computing
Monte Carlo Strategies in Scientific Computing
Estimating the impressionrank of web pages
Proceedings of the 18th international conference on World wide web
Measure-driven keyword-query expansion
Proceedings of the VLDB Endowment
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Unbiased estimation of size and other aggregates over hidden web databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
UIST '10 Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology
Relevance-index size tradeoff in contextual advertising
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Learning website hierarchies for keyword enrichment in contextual advertising
Proceedings of the fourth ACM international conference on Web search and data mining
Query expansion based on clustered results
Proceedings of the VLDB Endowment
Characterizing the usability of interactive applications through query log analysis
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Attribute domain discovery for hidden web databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient Search Engine Measurements
ACM Transactions on the Web (TWEB)
Data Mining and Knowledge Discovery
Stratified k-means clustering over a deep web data source
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal algorithms for crawling a hidden database in the web
Proceedings of the VLDB Endowment
Semantic Query Expansion using Cluster Based Domain Ontologies
International Journal of Information Retrieval Research
Searching the deep web using proactive phrase queries
Proceedings of the 22nd international conference on World Wide Web companion
Mining a search engine's corpus without a query pool
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Semantic discovery from web comparison queries
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The deep web: woven to catch the middle ground
Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoning
Analyzing, Detecting, and Exploiting Sentiment in Web Queries
ACM Transactions on the Web (TWEB)
Rank discovery from web databases
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Many search engines and other web applications suggest auto-completions as the user types in a query. The suggestions are generated from hidden underlying databases, such as query logs, directories, and lexicons. These databases consist of interesting and useful information, but they are typically not directly accessible. In this paper we describe two algorithms for sampling suggestions using only the public suggestion interface. One of the algorithms samples suggestions uniformly at random and the other samples suggestions proportionally to their popularity. These algorithms can be used to mine the hidden suggestion databases. Example applications include comparison of popularity of given keywords within a search engine's query log, estimation of the volume of commercially-oriented queries in a query log, and evaluation of the extent to which a search engine exposes its users to negative content. Our algorithms employ Monte Carlo methods in order to obtain unbiased samples from the suggestion database. Empirical analysis using a publicly available query log demonstrates that our algorithms are efficient and accurate. Results of experiments on two major suggestion services are also provided.