String matching on the internet

Authors:
Hervé Brönnimann;Nasir Memon;Kulesh Shanmugasundaram
Affiliations:
Polytechnic University, NY;Polytechnic University, NY;Polytechnic University, NY
Venue:
CAAN'04 Proceedings of the First international conference on Combinatorial and Algorithmic Aspects of Networking
Year:
2004

Citing 12
Cited 1

Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
A protocol-independent technique for eliminating redundant network traffic

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Compressed bloom filters

IEEE/ACM Transactions on Networking (TON)
Single-packet IP traceback

IEEE/ACM Transactions on Networking (TON)
Value-based web caching

WWW '03 Proceedings of the 12th international conference on World Wide Web
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Space-code bloom filter for efficient traffic flow measurement

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
The Bloomier filter: an efficient data structure for static support lookup tables

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
A linear lower bound on index size for text retrieval

Journal of Algorithms - Special issue: Twelfth annual ACM-SIAM symposium on discrete algorithms
Payload attribution via hierarchical bloom filters

Proceedings of the 11th ACM conference on Computer and communications security
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference

Payload attribution via hierarchical bloom filters

Proceedings of the 11th ACM conference on Computer and communications security

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a variant of the “string searching in database” problem where the string database comes on a data stream, and processing the data is at a premium but querying is not a runtime bottleneck. Speci.cally, the strings to be searched into (let's call them the documents) have to be processed online very e.ciently, meaning the documents have to be added to some string searching data structure one by one in time proportional to their length. Of course, we desire this data structure to be small, i.e. at most linear space, and hopefully exhibit a tradeo. between storage/processing cost and accuracy. Upon some query string, the data structure must return whether that string is contained in a document (the presence query), and must also be able to return a list of the documents which contain the query (the attribution query). We may require that the query be large enough and that only portions of it may match (pattern matching). In practice, it is acceptable that the data structure return a superset of the answer, as long as no document from the answer is missing and there are only few false positives; either the false positives can be .ltered (by actual veri.cation if the document texts are available in a repository), or a small number of false positives are acceptable for the application (e.g. network forensics, see below).