Accelerating Web Content Filtering by the Early Decision Algorithm

Authors:
Po-Ching Lin;Ming-Dao Liu;Ying-Dar Lin;Yuan-Cheng Lai
Affiliations:
-;-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2008

Citing 10
Cited 1

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Integrated Region-Based Image Retrieval

Integrated Region-Based Image Retrieval
Information Retrieval

Information Retrieval
Neural Networks for Web Content Filtering

IEEE Intelligent Systems
Identification and classification of proper nouns in Chinese texts

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis

IEEE Transactions on Knowledge and Data Engineering
Combining naive bayes and n-gram language models for text classification

ECIR'03 Proceedings of the 25th European conference on IR research

Information extraction from webpages based on DOM distances

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-time content analysis is typically a bottleneck in Web filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement other Web filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.