Deep classifier: automatically categorizing search results into large-scale hierarchies

  • Authors:
  • Dikan Xing;Gui-Rong Xue;Qiang Yang;Yong Yu

  • Affiliations:
  • Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;Hong Kong University of Science and Technology, Hong Kong, China;Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Organizing Web search results into hierarchical categories facilitates users' browsing through Web search results, especially for ambiguous queries where the potential results are mixed together. Previous methods on search result classification are usually based on pre-training a classification model on some fixed and shallow hierarchical categories, where only the top-two-level categories of a Web taxonomy is used. Such classification methods may be too coarse for users to browse, since most search results would be classified into only two or three shallow categories. Instead, a deep hierarchical classifier must provide many more categories. However, the performance of such classifiers is usually limited because their classification effectiveness can deteriorate rapidly at the third or fourth level of a hierarchy. In this paper, we propose a novel algorithm known as Deep Classifier to classify the search results into detailed hierarchical categories with higher effectiveness than previous approaches. Given the search results in response to a query, the algorithm first prunes a wide-ranged hierarchy into a narrow one with the help of some Web directories. Different strategies are proposed to select the training data by utilizing the hierarchical structures. Finally, a discriminative naíve Bayesian classifier is developed to perform efficient and effective classification. As a result, the algorithm can provide more meaningful and specific class labels for search result browsing than shallow style of classification. We conduct experiments to show that the Deep Classifier can achieve significant improvement over state-of-the-art algorithms. In addition, with sufficient off-line preparation, the efficiency of the proposed algorithm is suitable for online application