Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proceedings of the 27th International Conference on Very Large Data Bases
Hi-index | 0.00 |
The rapid growth of biomedical information in the Deep Web has produced unprecedented challenges for traditional search engines. This paper describes a new Deep web resource discovery system for biomedical information. We designed two hypertext mining applications: a Focused Crawler that selectively seeks out relevant pages using a classifier that evaluates the relevance of the document with respect to biomedical information, and a Query Interface Extractor that extracts information from the page to detect the presence of a Deep Web database. Our anecdotes suggest that combining focused crawling with query interface extraction is very effective for building high-quality collections of Deep Web resources on biomedical topics.