Blocking objectionable web content by leveraging multiple information sources

  • Authors:
  • Nitin Agarwal;Huan Liu;Jianping Zhang

  • Affiliations:
  • Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ;AOL, Inc., Dulles, VA

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web has now become a humongous archive of various contents. The inordinate amount of information found on the web presents a challenge to deliver right information to the right users. On one hand, the abundant information is freely accessible to all web denizens; on the other hand, much of such information may be irrelevant or even deleterious to some users. For example, some control and filtering mechanisms are desired to prevent inappropriate or offensive materials such as pornographic websites from reaching children. Ways of accessing websites are termed as Access Scenarios. An Access Scenario can include using search engines (e.g., image search that has very little textual content), URL redirection to some websites, or directly typing (porn) website URLs. In this paper we propose a framework to analyze a website from several different aspects or information sources, and generate a classification model aiming to accurately classify such content irrespective of access scenarios. Extensive experiments are performed to evaluate the resulting system, which illustrates the promise of the proposed approach.