Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR
ACM Transactions on Information Systems (TOIS)
The SIFT information dissemination system
ACM Transactions on Database Systems (TODS)
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Hermes: A Distributed Event-Based Middleware Architecture
ICDCSW '02 Proceedings of the 22nd International Conference on Distributed Computing Systems
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
SCRIBE: The Design of a Large-Scale Event Notification Infrastructure
NGC '01 Proceedings of the Third International COST264 Workshop on Networked Group Communication
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Sketching streams through the net: distributed approximate query tracking
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Internet scale string attribute publish/subscribe data networks
Proceedings of the 14th ACM international conference on Information and knowledge management
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Web text retrieval with a P2P query-driven index
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Architectural Alternatives for Information Filtering in Structured Overlays
IEEE Internet Computing
LibraRing: an architecture for distributed digital libraries based on DHTs
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
MinervaDL: an architecture for information retrieval and filtering in distributed digital libraries
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Information filtering and query indexing for an information retrieval model
ACM Transactions on Information Systems (TOIS)
Distinct-value synopses for multiset operations
Communications of the ACM - A View of Parallel Computing
A peer-selection algorithm for information retrieval
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
KMV-peer: a robust and adaptive peer-selection algorithm
Proceedings of the fourth ACM international conference on Web search and data mining
Peer-to-peer web search: euphoria, achievements, disillusionment, and future opportunities
From active data management to event-based systems and more
A Survey of Automatic Query Expansion in Information Retrieval
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Information filtering, also referred to as publish/subscribe, complements one-time searching since users are able to subscribe to information sources and be notified whenever new documents of interest are published. In approximate information filtering only selected information sources, that are likely to publish documents relevant to the user interests in the future, are monitored. To achieve this functionality, a subscriber exploits statistical metadata to identify promising publishers and index its continuous query only in those publishers. The statistics are maintained in a directory, usually on a per-keyword basis, thus disregarding possible correlations among keywords. Using this coarse information, poor publisher selection may lead to poor filtering performance and thus loss of interesting documents.1 Based on the above observation, this work extends query routing techniques from the domain of distributed information retrieval in peer-to-peer (P2P) networks, and provides new algorithms for exploiting the correlation among keywords in a filtering setting. We develop and evaluate two algorithms based on single-key and multi-key statistics and utilize two different synopses (Hash Sketches and KMV synopses) to compactly represent publishers. Our experimental evaluation using two real-life corpora with web and blog data demonstrates the filtering effectiveness of both approaches and highlights the different tradeoffs.