On the convergence of the multidimensional Albus perceptron
International Journal of Robotics Research
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Fast algorithms for sorting and searching strings
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Optimal Exact Strring Matching Based on Suffix Arrays
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
A Memory-Efficient Adaptive Huffman Coding Algorthm for Very Large Sets of Symbols
DCC '98 Proceedings of the Conference on Data Compression
Generating realistic workloads for network intrusion detection systems
WOSP '04 Proceedings of the 4th international workshop on Software and performance
RTAS '04 Proceedings of the 10th IEEE Real-Time and Embedded Technology and Applications Symposium
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Hi-index | 12.05 |
The world wide web (WWW) has become the largest information archive in the world because of its connectivity and scalability. Web pages, identified by URLs, are the basic forms for transmitting the requested information to clients' PC, whose number continuously explodes. There are a large portion of the web pages containing undesirable content, such as pornography, crimes, drugs and terrorisms, which makes viewers' discretion necessary. The large number of the undesirable web pages, however, has made the blocking more difficult on a client PC, because checking through the large collection of URLs is a time-consuming task. We propose a neural network method for determining the existing status of a requested URL in the large prohibited collection. The large prohibited URL collection containing 400,000 URLs was obtained by specifying a number of keywords, e.g. ''porn'' or ''sex'', on several commercial search engines. The simulation results show superior performances in both memory requirement and speed, comparing with a database implementation on the same PC.