Approximation algorithms for directed Steiner problems
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Mesh-based content routing using XML
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A String Matching Algorithm Fast on the Average
Proceedings of the 6th Colloquium, on Automata, Languages and Programming
One-dimensional and multi-dimensional substring selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Enhancing byte-level network intrusion detection signatures with context
Proceedings of the 10th ACM conference on Computer and communications security
RE-tree: an efficient index structure for regular expressions
The VLDB Journal — The International Journal on Very Large Data Bases
Processing XML streams with deterministic automata and stream indexes
ACM Transactions on Database Systems (TODS)
Algorithms to accelerate multiple regular expressions matching for deep packet inspection
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Fast and memory-efficient regular expression matching for deep packet inspection
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Towards an internet-scale XML dissemination service
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
ZStream: a cost-based query processor for adaptively detecting composite events
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A scalable, predictable join operator for highly concurrent data warehouses
Proceedings of the VLDB Endowment
2-layer erroneous-plan recognition for dementia patients in smart homes
Healthcom'09 Proceedings of the 11th international conference on e-Health networking, applications and services
Online constrained pattern detection over streams
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5
The architecture and implementation of an extensible web crawler
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Co-match: fast and efficient packet inspection for multiple flows
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
High-performance dynamic pattern matching over disordered streams
Proceedings of the VLDB Endowment
SigMatch: fast and scalable multi-pattern matching
Proceedings of the VLDB Endowment
Compressing regular expressions' DFA table by matrix decomposition
CIAA'10 Proceedings of the 15th international conference on Implementation and application of automata
Predictable performance and high query concurrency for data analytics
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Regular Expression (RE) matching has important applications in the areas of XML content distribution and network security. In this paper, we present the end-to-end design of a high performance RE matching system. Our system combines the processing efficiency of Deterministic Finite Automata (DFA) with the space efficiency of Non-deterministic Finite Automata (NFA) to scale to hundreds of REs. In experiments with real-life RE data on data streams, we found that a bulk of the DFA transitions are concentrated around a few DFA states. We exploit this fact to cache only the frequent core of each DFA in memory as opposed to the entire DFA (which may be exponential in size). Further, we cluster REs such that REs whose interactions cause an exponential increase in the number of states are assigned to separate groups -- this helps to improve cache hits by controlling the overall DFA size. To the best of our knowledge, ours is the first end-to-end system capable of matching REs at high speeds and in their full generality. Through a clever combination of RE grouping, and static and dynamic caching, it is able to perform RE matching at high speeds, even in the presence of limited memory. Through experiments with real-life data sets, we show that our RE matching system convincingly outperforms a state-of-the-art Network Intrusion Detection tool with support for efficient RE matching.