Modern Information Retrieval
CMS-ToPSS: efficient dissemination of RSS documents
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Are raw RSS feeds suitable for broad issue scanning? A science concern case study
Journal of the American Society for Information Science and Technology
The portrait of a common HTML web page
Proceedings of the 2006 ACM symposium on Document engineering
Learning a spelling error model from search query logs
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Efficient query subscription processing for prospective search engines
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Efficient Monitoring Algorithm for Fast News Alerts
IEEE Transactions on Knowledge and Data Engineering
Introduction to Information Retrieval
Introduction to Information Retrieval
Maintaining dynamic channel profiles on the web
Proceedings of the VLDB Endowment
Self-Join Size Estimation in Large-scale Distributed Data Systems
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A Data Structure for Sponsored Search
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
RSS watchdog: an instant event monitor on real online news streams
Proceedings of the 18th ACM conference on Information and knowledge management
Semantic-based Merging of RSS Items
World Wide Web
Generalized distances between rankings
Proceedings of the 19th international conference on World wide web
A study on content and management style of corporate blogs
OCSC'07 Proceedings of the 2nd international conference on Online communities and social computing
Feeding frenzy: selectively materializing users' event feeds
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The gist of everything new: personalized top-k processing over web 2.0 streams
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Subscription indexes for web syndication systems
Proceedings of the 15th International Conference on Extending Database Technology
Efficient filtering in micro-blogging systems: we won't get flooded again
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Processing continuous text queries featuring non-homogeneous scoring functions
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
We are witnessing a widespread of web syndication technologies such as RSS or Atom for a timely delivery of frequently updated Web content. Almost every personal weblog, news portal, or discussion forum employs nowadays RSS/Atom feeds for enhancing pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Social media applications such as Twitter or Facebook also employ RSS for notifying users about the newly available posts of their preferred friends. Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds' behavior and content, characterization which can be used to successfully benchmark effectiveness and efficiency of various RSS processing/ analysis techniques. In this paper, we present the first thorough analysis of three complementary features of real-scale RSS feeds, namely, publication activity, items structure and length, as well as, vocabulary of its content which we believe are crucial for Web 2.0 applications.