Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
A protocol-independent technique for eliminating redundant network traffic
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
IEEE/ACM Transactions on Networking (TON)
IEEE/ACM Transactions on Networking (TON)
WWW '03 Proceedings of the 12th international conference on World Wide Web
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Space-code bloom filter for efficient traffic flow measurement
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
A linear lower bound on index size for text retrieval
Journal of Algorithms - Special issue: Twelfth annual ACM-SIAM symposium on discrete algorithms
Payload attribution via hierarchical bloom filters
Proceedings of the 11th ACM conference on Computer and communications security
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Payload attribution via hierarchical bloom filters
Proceedings of the 11th ACM conference on Computer and communications security
Hi-index | 0.00 |
We consider a variant of the “string searching in database” problem where the string database comes on a data stream, and processing the data is at a premium but querying is not a runtime bottleneck. Speci.cally, the strings to be searched into (let's call them the documents) have to be processed online very e.ciently, meaning the documents have to be added to some string searching data structure one by one in time proportional to their length. Of course, we desire this data structure to be small, i.e. at most linear space, and hopefully exhibit a tradeo. between storage/processing cost and accuracy. Upon some query string, the data structure must return whether that string is contained in a document (the presence query), and must also be able to return a list of the documents which contain the query (the attribution query). We may require that the query be large enough and that only portions of it may match (pattern matching). In practice, it is acceptable that the data structure return a superset of the answer, as long as no document from the answer is missing and there are only few false positives; either the false positives can be .ltered (by actual veri.cation if the document texts are available in a repository), or a small number of false positives are acceptable for the application (e.g. network forensics, see below).