Introduction to topic detection and tracking
Topic detection and tracking
Corpora for topic detection and tracking
Topic detection and tracking
Effective page refresh policies for Web crawlers
ACM Transactions on Database Systems (TODS)
Adaptive pull-based policies for wide area data delivery
ACM Transactions on Database Systems (TODS)
Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Corona: a high performance publish-subscribe system for the world wide web
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Efficient Monitoring Algorithm for Fast News Alerts
IEEE Transactions on Knowledge and Data Engineering
A new aggregation policy for RSS services
Proceedings of the 2008 international workshop on Context enabled source and service selection, integration and adaptation: organized with the 17th International World Wide Web Conference (WWW 2008)
Cobra: contentbased filtering and aggregation of blogs and RSS feeds
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Causal relation detection for activities from heterogeneous sources
ICWE'11 Proceedings of the 11th international conference on Current Trends in Web Engineering
Hi-index | 0.00 |
Web feeds allow users to retrieve new content from pages on the World Wide Web. Feeds are offered by a multitude of web pages, ranging from conventional news sites to pages with user generated content such as wikis, forums or personal blogs. They notify interested readers of new content and are therefore interesting for information retrieval tasks. Unfortunately, there is no comprehensive dataset of feeds publicly available, making it difficult for researchers to work with this kind of data and, more importantly, to compare their research results by using a common dataset. In this work we present an extensive real-world dataset of 200,000 diversified feeds, as well as an analysis thereof. The dataset has been collected for a time span of four weeks, yielding over 54 million entries and 100 GB of compressed data. One important outcome of the analysis is, that feeds show different activity patterns that should be considered by aggregators, such as feed reader software, to improve polling strategies. The dataset has been made publicly available for use by research communities around the world.