Characterizing a national community web

  • Authors:
  • Daniel Gomes;Mário J. Silva

  • Affiliations:
  • University of Lisbon, Lisboa, Portugal;University of Lisbon, Lisboa, Portugal

  • Venue:
  • ACM Transactions on Internet Technology (TOIT)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article presents a characterization of the community Web of the people of Portugal. We defined criteria for delimiting this Web based on our past experience of crawling pages related to Portugal and collected over 3.2 million documents from 46,000 sites satisfying those criteria. Our characterization was derived from this crawl. We describe the rules that we established for defining the boundaries of this community Web and the methodology used to gather statistics. Statistics cover the number and domain distribution of sites; the number, type and size distribution of text documents; and the linkage structure of this Web. We also show how crawling constraints and abnormal situations on the Web can influence the statistics.