Blocking objectionable web content by leveraging multiple information sources

Authors:
Nitin Agarwal;Huan Liu;Jianping Zhang
Affiliations:
Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ;AOL, Inc., Dulles, VA
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2006

Citing 26
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Support vector domain description

Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
A statistical learning learning model of text classification for support vector machines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Extraction, Construction and Selection: A Data Mining Perspective

Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Text categorization based on k-nearest neighbor approach for web site classification

Information Processing and Management: an International Journal
Discovering informative content blocks from Web documents

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring the factors affecting internet content filters acceptance

ACM SIGecom Exchanges
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
Uniform object generation for optimizing one-class classifiers

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Web page feature selection and classification using neural networks

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Web page classification without the web page

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Audio enriched links: web page previews for blind users

Assets '04 Proceedings of the 6th international ACM SIGACCESS conference on Computers and accessibility
Stylistic and lexical co-training for web block classification

Proceedings of the 6th annual ACM international workshop on Web information and data management
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Fast webpage classification using URL features

Proceedings of the 14th ACM international conference on Information and knowledge management
Bias Analysis in Text Classification for Highly Skewed Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining

WebAngels Filter: A Violent Web Filtering Engine Using Textual and Structural Content-Based Analysis

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
A boosted semi-supervised learning framework for web page filtering

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Web objectionable text content detection using topic modeling technique

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The World Wide Web has now become a humongous archive of various contents. The inordinate amount of information found on the web presents a challenge to deliver right information to the right users. On one hand, the abundant information is freely accessible to all web denizens; on the other hand, much of such information may be irrelevant or even deleterious to some users. For example, some control and filtering mechanisms are desired to prevent inappropriate or offensive materials such as pornographic websites from reaching children. Ways of accessing websites are termed as Access Scenarios. An Access Scenario can include using search engines (e.g., image search that has very little textual content), URL redirection to some websites, or directly typing (porn) website URLs. In this paper we propose a framework to analyze a website from several different aspects or information sources, and generate a classification model aiming to accurately classify such content irrespective of access scenarios. Extensive experiments are performed to evaluate the resulting system, which illustrates the promise of the proposed approach.