Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search
IEEE Transactions on Knowledge and Data Engineering
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Keyword Searching and Browsing in Databases using BANKS
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A Frequency-based Approach for Mining Coverage Statistics in Data Integration
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
When one sample is not enough: improving text database selection using shrinkage
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Federated text retrieval from uncooperative overlapped collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Truth Discovery with Multiple Conflicting Information Providers on the Web
IEEE Transactions on Knowledge and Data Engineering
Communications of the ACM
Exploiting web search engines to search structured databases
Proceedings of the 18th international conference on World wide web
Integrating conflicting data: the role of source dependence
Proceedings of the VLDB Endowment
Tracking the random surfer: empirically measured teleportation parameters in PageRank
Proceedings of the 19th international conference on World wide web
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement
Proceedings of the 19th international conference on World wide web
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
Factal: integrating deep web based on trust and relevance
Proceedings of the 20th international conference companion on World wide web
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement
Proceedings of the 20th international conference on World wide web
Heterogeneous network-based trust analysis: a survey
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
One immediate challenge in searching the deep web databases is source selection---i.e. selecting the most relevant web databases for answering a given query. For open collections like the deep web, the source selection must be sensitive to trustworthiness and importance of sources. Recent advances solve these problems for a single topic deep web search adapting an agreement based approach (c.f. SourceRank [10]). In this paper we introduce a source selection method sensitive to trust and importance for multi topic deep web search. We compute multiple quality scores of a source tailored to different topics, based on the topic specific crawl data. At the query time, we classify the query to determine its probability of membership in different topics. These fractional memberships are used as the weights to the topic specific quality scores of sources to select sources for the query. Extensive experiments on more than a thousand sources in multiple topics show 18-85% improvements in result quality over Google Product Search and other existing methods.