Predicting primary categories of business listings for local search

Authors:
Changsung Kang;Jeehaeng Lee;Yi Chang
Affiliations:
Yahoo!, Sunnyvale, CA, USA;Yahoo!, Sunnyvale, CA, USA;Yahoo!, Sunnyvale, CA, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 7
Cited 1

Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hyper-local, directions-based ranking of places

Proceedings of the VLDB Endowment
Learning to rank with multi-aspect relevance for vertical search

Proceedings of the fifth ACM international conference on Web search and data mining

A hierarchical Dirichlet model for taxonomy expansion for search engines

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of identifying primary categories of a business listing among the categories provided by the owner of the business. The category information submitted by business owners cannot be trusted with absolute certainty since they may purposefully add some secondary or irrelevant categories to increase recall in local search results, which makes category search very challenging for local search engines. Thus, identifying primary categories of a business is a crucial problem in local search. This problem can be cast as a multi-label classification problem with a large number of categories. However, the large scale of the problem makes it infeasible to use conventional supervised-learning-based text categorization approaches. We propose a large-scale classification framework that leverages multiple types of classification labels to produce a highly accurate classifier with fast training time. We effectively combine the complementary label sources to refine prediction. The experimental results indicate that our framework achieves very high precision and recall and outperforms a Centroid-based method.