Comparison of Three Vertical Search Spiders

Authors:
Michael Chau;Hsinchun Chen
Affiliations:
University of Arizona;University of Arizona
Venue:
Computer
Year:
2003

Citing 11
Cited 28

A neural network for probabilistic information retrieval

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation

Journal of the American Society for Information Science
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Comparing noun phrasing techniques for use with medical digital library tools

Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Mining the Web's Link Structure

Computer
HelpfulMed: intelligent searching for medical information over the internet

Journal of the American Society for Information Science and Technology

Building domain-specific web collections for scientific digital libraries: a meta-search enhanced focused crawling method

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Applying web analysis in web page filtering

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
EBizPort: collecting and analyzing business intelligence information

Journal of the American Society for Information Science and Technology
Multilingual Web retrieval: An experiment in English–Chinese business intelligence

Journal of the American Society for Information Science and Technology
Analysis of the query logs of a web site search engine

Journal of the American Society for Information Science and Technology
Building a scientific knowledge web portal: the NanoPort experience

Decision Support Systems
Redips: Backlink search and analysis on the Web for business intelligence analysis: Research Articles

Journal of the American Society for Information Science and Technology
CMedPort: an integrated approach to facilitating Chinese medical information seeking

Decision Support Systems
Combining text and link analysis for focused crawling-An application for vertical search engines

Information Systems
A machine learning approach to web page filtering using content and structure analysis

Decision Support Systems
BioPortal Infectious Disease Informatics research: disease surveillance and situational awareness

dg.o '08 Proceedings of the 2008 international conference on Digital government research
Identification of time-varying objects on the web

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
SpidersRUs: Creating specialized search engines in multiple languages

Decision Support Systems
MedSearch: a specialized search engine for medical information retrieval

Proceedings of the 17th ACM conference on Information and knowledge management
Nuclear Threat Detection Via the Nuclear Web and Dark Web: Framework and Preliminary Study

EuroISI '08 Proceedings of the 1st European Conference on Intelligence and Security Informatics
A cross-language focused crawling algorithm based on multiple relevance prediction strategies

Computers & Mathematics with Applications
Automatic online news monitoring and classification for syndromic surveillance

Decision Support Systems
Designing the user interface and functions of a search engine development tool

Decision Support Systems
An effective relevance prediction algorithm based on hierarchical taxonomy for focused crawling

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A multi-region empirical study on the internet presence of global extremist organizations

Information Systems Frontiers
A vertical search engine for school information based on Heritrix and Lucene

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Using content-based and link-based analysis in building vertical search engines

ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Combining text and link analysis for focused crawling

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Collecting topic-related web pages for link structure analysis by using a potential hub and authority first approach

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Focused crawling using latent semantic indexing – an application for vertical search engines

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Schema driven and topic specific web crawling

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers

ACM Transactions on Information Systems (TOIS)
Vehicle defect discovery from social media

Decision Support Systems

Quantified Score

Hi-index	4.10

Visualization

Abstract

The Web's dynamic, unstructured nature makes locating resources difficult.Vertical search engines solve part of the problem by keeping indexes only in specific domains. They also offer more opportunity to apply domain knowledge in the spider applications that collect content for their databases.The authors used three approaches to investigate algorithms for improving the performance of vertical search engine spiders: a breadth-first graph-traversal algorithm with no heuristics to refine the search process, a best-first traversal algorithm that uses a hyperlink-analysis heuristic, and a spreading-activation algorithm based on modeling the Web as a neural network.