q-Gram Matching Using Tree Models

Authors:
Prahlad Fogla;Wenke Lee
Affiliations:
-;IEEE Computer Society
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 18
Cited 2

Improved string matching with k mismatches

ACM SIGACT News
Efficient string matching with k mismatches

Theoretical Computer Science
A very fast substring search algorithm

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Fast and practical approximate string matching

Information Processing Letters
q-gram based database searching using a suffix array (QUASAR)

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Approximate String Matching: A Simpler Faster Algorithm

SIAM Journal on Computing
Boyer-Moore Approach to Approximate String Matching (Extended Abstract)

SWAT '90 Proceedings of the 2nd Scandinavian Workshop on Algorithm Theory
Approximate String Matching and Local Similarity

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
On Using q-Gram Locations in Approximate String Matching

ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
A Sense of Self for Unix Processes

SP '96 Proceedings of the 1996 IEEE Symposium on Security and Privacy
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

VGRAM: improving performance of approximate queries on string collections using variable-length grams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

q{\hbox{-}}\rm gram matching is used for approximate substring matching problems in a wide range of application areas, including intrusion detection. In this paper, we present a tree-based model to perform fast linear time q{\hbox{-}}{\rm gram} matching. All q{\hbox{-}}{\rm grams} present in the text are stored in a tree structure similar to Trie. We use a tree redundancy pruning algorithm to reduce the size of the tree without losing any information. We also use suffix links for fast q{\hbox{-}}{\rm gram} search during query matching. We compare our work with the Rabin-Karp-based hash-table technique, commonly used for multiple q{\hbox{-}}{\rm gram} search. We present results of experiments on system call sequence data used for intrusion detection.