Feeding the world: a comprehensive dataset and analysis of a real world snapshot of web feeds

  • Authors:
  • Sandro Reichert;David Urbansky;Klemens Muthmann;Philipp Katz;Matthias Wauer;Alexander Schill

  • Affiliations:
  • Institute of Systems Architecture, Dresden, Germany;Institute of Systems Architecture, Dresden, Germany;Institute of Systems Architecture, Dresden, Germany;Institute of Systems Architecture, Dresden, Germany;Institute of Systems Architecture, Dresden, Germany;Institute of Systems Architecture, Dresden, Germany

  • Venue:
  • Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web feeds allow users to retrieve new content from pages on the World Wide Web. Feeds are offered by a multitude of web pages, ranging from conventional news sites to pages with user generated content such as wikis, forums or personal blogs. They notify interested readers of new content and are therefore interesting for information retrieval tasks. Unfortunately, there is no comprehensive dataset of feeds publicly available, making it difficult for researchers to work with this kind of data and, more importantly, to compare their research results by using a common dataset. In this work we present an extensive real-world dataset of 200,000 diversified feeds, as well as an analysis thereof. The dataset has been collected for a time span of four weeks, yielding over 54 million entries and 100 GB of compressed data. One important outcome of the analysis is, that feeds show different activity patterns that should be considered by aggregators, such as feed reader software, to improve polling strategies. The dataset has been made publicly available for use by research communities around the world.