Communications of the ACM
Resource containers: a new facility for resource management in server systems
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Filtering algorithms and implementation for very fast publish/subscribe systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Bayeux: an architecture for scalable and fault-tolerant wide-area data dissemination
NOSSDAV '01 Proceedings of the 11th international workshop on Network and operating systems support for digital audio and video
Design and evaluation of a wide-area event notification service
ACM Transactions on Computer Systems (TOCS)
King: estimating latency between arbitrary internet end hosts
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Efficient Matching for Web-Based Publish/Subscribe Systems
CooplS '02 Proceedings of the 7th International Conference on Cooperative Information Systems
Hermes: A Distributed Event-Based Middleware Architecture
ICDCSW '02 Proceedings of the 22nd International Conference on Distributed Computing Systems
SCRIBE: The Design of a Large-Scale Event Notification Infrastructure
NGC '01 Proceedings of the Third International COST264 Workshop on Networked Group Communication
ACM Transactions on Computer Systems (TOCS)
Harvest, Yield, and Scalable Tolerant Systems
HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Herald: Achieving a Global Event Notification Service
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Vivaldi: a decentralized network coordinate system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
An integrated experimental environment for distributed systems and networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Meridian: a lightweight network location service without virtual coordinates
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Client behavior and feed characteristics of RSS, a publish-subscribe system for web micronews
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Experiences building PlanetLab
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Corona: a high performance publish-subscribe system for the world wide web
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
OASIS: anycast for any service
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Network-aware query processing for stream-based applications
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
IrisNet: An Architecture for a Worldwide Sensor Web
IEEE Pervasive Computing
Measuring bandwidth between planetlab nodes
PAM'05 Proceedings of the 6th international conference on Passive and Active Network Measurement
Towards a common API for publish/subscribe
Proceedings of the 2007 inaugural international conference on Distributed event-based systems
Challenges in dependable internet-scale stream processing
Proceedings of the 2nd workshop on Dependable distributed data management
Adaptive content-based routing in general overlay topologies
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
panOULU Luotsi: a location based information mash-up with XML aggregator and WiFi positioning
Proceedings of the 7th International Conference on Mobile and Ubiquitous Multimedia
Rappel: Exploiting interest and network locality to improve fairness in publish-subscribe systems
Computer Networks: The International Journal of Computer and Telecommunications Networking
Stream feeds: an abstraction for the world wide sensor web
IOT'08 Proceedings of the 1st international conference on The internet of things
p2pWeb: An open, decentralized infrastructure of Web servers for sharing ephemeral Web content
Computer Networks: The International Journal of Computer and Telecommunications Networking
The architecture and implementation of an extensible web crawler
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving web search relevance and freshness with content previews
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Load Balancing Content-Based Publish/Subscribe Systems
ACM Transactions on Computer Systems (TOCS)
Proceedings of the FSE/SDP workshop on Future of software engineering research
RoSeS: a continuous content-based query engine for RSS feeds
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
STAIRS: Towards efficient full-text filtering and dissemination in DHT environments
The VLDB Journal — The International Journal on Very Large Data Bases
Feeding the world: a comprehensive dataset and analysis of a real world snapshot of web feeds
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
A semantic map of RSS feeds to support discovery
RED'10 Proceedings of the Third international conference on Resource Discovery
Distributed top-k full-text content dissemination
Distributed and Parallel Databases
Language expressiveness and quality of service for publish/subscribe systems
Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
XL peer-to-peer pub/sub systems
ACM Computing Surveys (CSUR)
Evaluating continuous top-k queries over document streams
World Wide Web
Hi-index | 0.00 |
Blogs and RSS feeds are becoming increasingly popular. The blogging site LiveJournal has over 11 million user accounts, and according to one report, over 1.6 million postings are made to blogs every day. The "Blogosphere" is a new hotbed of Internet-based media that represents a shift from mostly static content to dynamic, continuously-updated discussions. The problem is that finding and tracking blogs with interesting content is an extremely cumbersome process. In this paper, we present Cobra (Content-Based RSS Aggregator), a system that crawls, filters, and aggregates vast numbers of RSS feeds, delivering to each user a personalized feed based on their interests. Cobra consists of a three-tiered network of crawlers that scan web feeds, filters that match crawled articles to user subscriptions, and reflectors that provide recently-matching articles on each subscription as an RSS feed, which can be browsed using a standard RSS reader. We present the design, implementation, and evaluation of Cobra in three settings: a dedicated cluster, the Emulab testbed, and on PlanetLab. We present a detailed performance study of the Cobra system, demonstrating that the system is able to scale well to support a large number of source feeds and users; that the mean update detection latency is low (bounded by the crawler rate); and that an offline service provisioning step combined with several performance optimizations are effective at reducing memory usage and network load.