Modeling web data

  • Authors:
  • James C. French

  • Affiliations:
  • University of Virginia, Charlottesville, VA

  • Venue:
  • Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have created three testbeds of web data for use in controlled experiments in collection modeling. This short paper examines the applicability of Ziff's and Heaps' laws as applied to web data. We find extremely close agreement between observed vocabulary growth and Heaps' law. We find reasonable agreement with Ziff's law for medium to low frequency terms. Ziff's law is a poor predictor for high frequency terms. These findings hold for all three testbeds although we restrict ourselves to one here due to space limitations.