A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Integrated Region-Based Image Retrieval
Integrated Region-Based Image Retrieval
Information Retrieval
Neural Networks for Web Content Filtering
IEEE Intelligent Systems
Identification and classification of proper nouns in Chinese texts
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis
IEEE Transactions on Knowledge and Data Engineering
Combining naive bayes and n-gram language models for text classification
ECIR'03 Proceedings of the 25th European conference on IR research
Information extraction from webpages based on DOM distances
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hi-index | 0.00 |
Real-time content analysis is typically a bottleneck in Web filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement other Web filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.