Objectionable content filtering by click-through data

  • Authors:
  • Lung-Hao Lee;Yen-Cheng Juan;Hsin-Hsi Chen;Yuen-Hsien Tseng

  • Affiliations:
  • National Taiwan University, Taipei, Taiwan Roc;National Taiwan University, Taipei, Taiwan Roc;National Taiwan University, Taipei, Taiwan Roc;National Taiwan Normal University, Taipei, Taiwan Roc

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores users' browsing intents to predict the category of a user's next access during web surfing, and applies the results to objectionable content filtering. A user's access trail represented as a sequence of URLs reveals the contextual information of web browsing behaviors. We extract behavioral features of each clicked URL, i.e., hostname, bag-of-words, gTLD, IP, and port, to develop a linear chain CRF model for context-aware category prediction. Large-scale experiments show that our method achieves a promising accuracy of 0.9396 for objectionable access identification without requesting their corresponding page content. Error analysis indicates that our proposed model results in a low false positive rate of 0.0571. In real-life filtering simulations, our proposed model accomplishes macro-averaging blocking rate 0.9271, while maintaining a favorably low macro-averaging over-blocking rate 0.0575 for collaboratively filtering objectionable content with time change on the dynamic web.