Fast blocking of undesirable web pages on client PC by discriminating URL using neural networks

Authors:
Heng Ma
Affiliations:
Department of Industrial Engineering and System Management, Chung Hua University, No. 707, Sec. 2, WuFu Road, Hsinchu, Taiwan
Venue:
Expert Systems with Applications: An International Journal
Year:
2008

Citing 15
Cited 0

On the convergence of the multidimensional Albus perceptron

International Journal of Robotics Research
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
The string B-tree: a new data structure for string search in external memory and its applications

Journal of the ACM (JACM)
Fast algorithms for sorting and searching strings

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Compression: A Key for Next-Generation Text Retrieval Systems

Computer
Optimal Exact Strring Matching Based on Suffix Arrays

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
A Memory-Efficient Adaptive Huffman Coding Algorthm for Very Large Sets of Symbols

DCC '98 Proceedings of the Conference on Data Compression
Generating realistic workloads for network intrusion detection systems

WOSP '04 Proceedings of the 4th international workshop on Software and performance
Speeding up String Matching over Compressed Text on Handheld Devices using Tagged Sub-optimal Code (TSC)

RTAS '04 Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

SIAM Journal on Computing

Quantified Score

Hi-index	12.05

Visualization

Abstract

The world wide web (WWW) has become the largest information archive in the world because of its connectivity and scalability. Web pages, identified by URLs, are the basic forms for transmitting the requested information to clients' PC, whose number continuously explodes. There are a large portion of the web pages containing undesirable content, such as pornography, crimes, drugs and terrorisms, which makes viewers' discretion necessary. The large number of the undesirable web pages, however, has made the blocking more difficult on a client PC, because checking through the large collection of URLs is a time-consuming task. We propose a neural network method for determining the existing status of a requested URL in the large prohibited collection. The large prohibited URL collection containing 400,000 URLs was obtained by specifying a number of keywords, e.g. ''porn'' or ''sex'', on several commercial search engines. The simulation results show superior performances in both memory requirement and speed, comparing with a database implementation on the same PC.