Towards the taxonomy-oriented categorization of yellow pages queries

  • Authors:
  • Zhisheng Li;Xiangye Xiao;Meng Wang;Chong Wang;Xufa Wang;Xing Xie

  • Affiliations:
  • National University of Singapore;Google China;Hefei University of Technology;Princeton University;University of Science and Technology of China;Microsoft Research Asia

  • Venue:
  • ACM Transactions on Internet Technology (TOIT)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Yellow pages search is a popular service that provides a means for finding businesses close to particular locations. The efficient search of yellow pages is becoming a rapidly evolving research area. The underlying data maintained in yellow pages search engines are typically labeled according to Standard Industry Classification (SIC) categories, and users can search yellow pages with categories according to their interests. Categorizing yellow pages queries into a subset of topical categories can help to improve search experience and quality. However, yellow pages queries are usually short and ambiguous. In addition, a yellow pages query taxonomy is typically organized by a hierarchy of a fairly large number of categories. These characteristics make automatic yellow pages query categorization difficult and challenging. In this article, we propose a flexible yellow pages query categorization approach. The proposed technique is built based on a TF-IDF similarity taxonomy matching scheme that is able to provide more accurate query categorization than previous keyword-based matching schemes. To further improve the categorization performance, we design several filtering schemes. Through extensive experimentation, we demonstrate encouraging results. We obtain F1 measures of about 0.5 and 0.3 for categorizing yellow pages queries into 19 coarse categories and 244 finer categories, respectively. We investigate different components in the proposed approach and also demonstrate the superiority of our approach over a hierarchical support vector machine classifier.