Estimating the selectivity of approximate string queries
ACM Transactions on Database Systems (TODS)
Annals of Mathematics and Artificial Intelligence
Hi-index | 0.00 |
Digital pollution is emerging as an overwhelming threat to the Internet, whose ubiquitous connectivity conversely cultivates the widespread outbreaks of such dirt. Considerable amount of human efforts and network resources are wasted at a little cost of the few polluters. To prevent flooding of the contamination, classical string matching schemes and their variants can be used to detect these patterns for removal. The speed of detection is crucial to this application. In this paper, we propose a novel pattern detection technique based on the decision tree induction to seek for significant improvement over the classical schemes. According to the intrinsic of the pattern, the tree is sprouted adaptively to minimize the number of symbols in the data stream needed to be examined. This allows a unique order to inspect the symbols in a strategic way optimized contextually, as opposed to the fixed order followed by the other schemes. Performance study indicates our approach achieves the speed-up of five or more over the best competitors.