Hyperclique pattern based off-topic detection

  • Authors:
  • Tianming Hu;Qingui Xu;Huaqiang Yuan;Jiali Hou;Chao Qu

  • Affiliations:
  • Department of Computer Science, DongGuan University of Technology, DongGuan, China;Department of Computer Science, DongGuan University of Technology, DongGuan, China;Department of Computer Science, DongGuan University of Technology, DongGuan, China;Department of Computer Science, DongGuan University of Technology, DongGuan, China;Department of Computer Science, DongGuan University of Technology, DongGuan, China

  • Venue:
  • APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of detecting access to off-topic documents by exploiting user profiles. Existing methods usually store a few prototype off-topic documents as the profile and label their top nearest neighbors in the test set as suspects. This is based on the common assumption that nearby documents are from the same class. However, due to the inherent sparseness of high-dimensional space, a document and its nearest neighbors may not belong to the same class. To this end, we develop a hyperclique pattern based off-topic detection method for selecting which ones to label. Hyperclique patterns consider joint similarity among a set of objects instead of the traditional pairwise similarity. As a result, the objects from hypercliques are more reliable as seeds for classifying their neighbors. Indeed, our experimental results on real world document data favorably demonstrate the effectiveness of our technique over the existing methods in terms of detection precision.