Learning to Generate Labels for Organizing Search Results from a Domain-Specified Corpus

  • Authors:
  • Jing Zhao;Jing He

  • Affiliations:
  • Peking Univ., China;Peking Univ., China

  • Venue:
  • WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Organizing Web search results into labeled categories is a difficult but very useful task. The idea is to group the many results that each user query generates into well-labeled categories, so that users can find it much easier to browse these results. In the past, clustering-based methods have been applied to solve the search-result organization problem, but it has been difficult to extract the human-readable descriptions for these clusters. An alternative solution to this problem is to generate a series of labels from search results firstly, and then assign documents to relevant labels to form labeled categories. In this approach, a major task is how to generate the labels for the documents. In this paper, we propose a novel label generation method: Firstly, we extract some phrases as candidates of labels based on the search results, and adopt a binary classifier as our learning model to classify these label candidates into useful or meaningless label category. Then, the candidates in the useful label category form the final results. As our method is applied on the search results which are retrieved from a domain-specified corpus instead of general corpus, there're some special features of the labels for classification. Experimental results show that the accuracy of our system is nearly 10% higher than using the mutual information criterion, which is an unsupervised method for solving this problem, to do the label selection.