Introduction to algorithms
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Towards Automatic Incorporation of Search Engines into a Large-Scale Metasearch Engine
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Structured databases on the web: observations and implications
ACM SIGMOD Record
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Hi-index | 0.00 |
For a given set of search engines, a search engine is redundant if its searchable contents can be found from other search engines in this set. In this paper, we propose a method to identify redundant search engines in a very large-scale metasearch engine context. The general problem is equivalent to an NP hard problem -- the set-covering problem. Due to the large number of search engines that need to be considered and the large sizes of these search engines, approximate solutions must be developed. In this paper, we propose a general methodology to tackle this problem and within the context of this methodology, we propose several new heuristic algorithms for solving the set-covering problem.