Mercator: A scalable, extensible Web crawler

  • Authors:
  • Allan Heydon;Marc Najork

  • Affiliations:
  • Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA 94301, USA E-mail: {heydon,najork}@pa.dec.com;Compaq Systems Research Center, 130 Lytton Avenue, Palo Alto, CA 94301, USA E-mail: {heydon,najork}@pa.dec.com

  • Venue:
  • World Wide Web
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes Mercator, a scalable, extensible Web crawler written entirely in Java. Scalable Web crawlers are an important component of many Web services, but their design is not well-documented in the literature. We enumerate the major components of any scalable Web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. We also describe Mercator’s support for extensibility and customizability. Finally, we comment on Mercator’s performance, which we have found to be comparable to that of other crawlers for which performance numbers have been published.