STAIRS: Towards efficient full-text filtering and dissemination in DHT environments

Authors:
Weixiong Rao;Lei Chen;Ada Wai-Chee Fu
Affiliations:
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2011

Citing 36
Cited 3

Document filtering with inference networks

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Matrices, Vector Spaces, and Information Retrieval

SIAM Review
The SIFT information dissemination system

ACM Transactions on Database Systems (TODS)
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Filtering algorithms and implementation for very fast publish/subscribe systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Tapestry: a fault-tolerant wide-area application infrastructure

ACM SIGCOMM Computer Communication Review
Search and replication in unstructured peer-to-peer networks

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Chord: a scalable peer-to-peer lookup protocol for internet applications

IEEE/ACM Transactions on Networking (TON)
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Text-Based Content Search and Retrieval in Ad-hoc P2P Communities

Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Peer-to-peer information retrieval using self-organizing semantic overlay networks

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
An Efficient Multicast Protocol for Content-Based Publish-Subscribe Systems

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Publish/subscribe functionality in IR environments using structured overlay networks

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient semantic search on DHT overlays

Journal of Parallel and Distributed Computing
Answering bounded continuous search queries in the world wide web

Proceedings of the 16th international conference on World Wide Web
Boosting topic-based publish-subscribe systems with dynamic clustering

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Ferry: A P2P-Based Architecture for Content-Based Publish/Subscribe Services

IEEE Transactions on Parallel and Distributed Systems
Corona: a high performance publish-subscribe system for the world wide web

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
OverCite: a distributed, cooperative citeseer

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Optimal proactive caching in peer-to-peer network: analysis and application

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Adaptive distributed indexing for structured peer-to-peer networks

Proceedings of the 17th ACM conference on Information and knowledge management
An optimal overlay topology for routing peer-to-peer searches

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
STAIRS: Towards Efficient Full-Text Filtering and Dissemination in a DHT Environment

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A Novel Content Distribution Mechanism in DHT Networks

NETWORKING '09 Proceedings of the 8th International IFIP-TC 6 Networking Conference
Optimal Resource Placement in Structured Peer-to-Peer Networks

IEEE Transactions on Parallel and Distributed Systems
Cobra: contentbased filtering and aggregation of blogs and RSS feeds

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
FeedTree: sharing web micronews with peer-to-peer event notification

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
Scribe: a large-scale and decentralized application-level multicast infrastructure

IEEE Journal on Selected Areas in Communications

Distributed top-k full-text content dissemination

Distributed and Parallel Databases
Energy-aware keyword search on mobile phones

Proceedings of the first edition of the MCC workshop on Mobile cloud computing
Evaluating continuous top-k queries over document streams

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays "live" content, such as weblog, wikipedia, and news, is ubiquitous in the Internet. Providing users with relevant content in a timely manner becomes a challenging problem. Differing from Web search technologies and RSS feeds/reader applications, this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT) based Peer-to-Peer (P2P) Network. Users subscribe to their interested content by specifying input keywords and thresholds as filters. Then, content is disseminated to those users having interest in it. In the literature, full-text document publishing in DHTs has suffered for a long time from the high cost of forwarding a document to home nodes of all distinct terms. It is aggravated by the fact that a document contains a large number of distinct terms (typically tens or thousands of terms per document). In this paper, we propose a set of novel techniques to overcome such a high forwarding cost by carefully selecting a very small number of meaningful terms (or key features) among candidate terms inside each document. Next, to reduce the average hop count per forwarding, we further prune irrelevant documents during the forwarding path. Experiments based on two real query logs and two real data sets demonstrate the effectiveness of our solution.