A fast string searching algorithm
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A String Matching Algorithm Fast on the Average
Proceedings of the 6th Colloquium, on Automata, Languages and Programming
Matching web site structure and content
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Snort - Lightweight Intrusion Detection for Networks
LISA '99 Proceedings of the 13th USENIX conference on System administration
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
SecuBat: a web vulnerability scanner
Proceedings of the 15th international conference on World Wide Web
On-line Approximate String Matching in Natural Language
Fundamenta Informaticae
A new suffix tree similarity measure for document clustering
Proceedings of the 16th international conference on World Wide Web
Mining contiguous sequential patterns from web logs
Proceedings of the 16th international conference on World Wide Web
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Orientation distance-based discriminative feature extraction for multi-class classification
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Due to the continuous and rampant increase in the size of domain specific data sources, there is a real and sustained need for fast processing in time-sensitive applications, such as medical record information extraction at the point of care, genetic feature extraction for personalized treatment, as well as off-line knowledge discovery such as creating evidence based medicine. Since parallel multi-string matching is at the core of most data mining tasks in these applications, faster on-line matching in static and streaming data is needed to improve the overall efficiency of such knowledge discovery. To solve this data mining need not efficiently handled by traditional information extraction and retrieval techniques, we propose a Block Suffix Shifting-based approach, which is an improvement over the state of the art multi-string matching algorithms such as Aho-Corasick, Commentz-Walter, and Wu-Manber. The strength of our approach is its ability to exploit the different block structures of domain specific data for off-line and online parallel matching. Experiments on several real world datasets show how our approach translates into significant performance improvements.