Simple but Effective Porn Query Recognition by k-NN with Semantic Similarity Measure

  • Authors:
  • Shunkai Fu;Michel C. Desmarais;Bingfeng Pi;Ying Zhou;Weilei Wang;Gang Zou;Song Han;Xunrong Rao

  • Affiliations:
  • Roboo Inc.,;Ecole Polytechnique de Montreal,;Roboo Inc.,;Roboo Inc.,;Roboo Inc.,;Roboo Inc.,;Roboo Inc.,;Roboo Inc.,

  • Venue:
  • APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Access to sexual information has to be given some restricts on commercial search engine. Compared with filtering porn contents directly, we prefer to recognize porn queries and recommend appropriate ones considering several potential advantages. However, how to recognize them in an automatic way is not a trivial job due that its short length, in most scenarios, doesn't allow enough information for machine to make correct decision. In this paper, a simple but effective solution is proposed to recognize porn queries as exist in very large query log. Instead of checking purely if there are sensitive words contained in the queries, which may work for some cases but has obvious limitations, we go a little further by collecting and studying the semantic content of queries. Our experiments with real data demonstrate that small cost in training a k-Nearest Neighbor classifier (k-NN) will bring us quite impressive classification performance, especially the recall.