Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hyper-local, directions-based ranking of places
Proceedings of the VLDB Endowment
Learning to rank with multi-aspect relevance for vertical search
Proceedings of the fifth ACM international conference on Web search and data mining
A hierarchical Dirichlet model for taxonomy expansion for search engines
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
We consider the problem of identifying primary categories of a business listing among the categories provided by the owner of the business. The category information submitted by business owners cannot be trusted with absolute certainty since they may purposefully add some secondary or irrelevant categories to increase recall in local search results, which makes category search very challenging for local search engines. Thus, identifying primary categories of a business is a crucial problem in local search. This problem can be cast as a multi-label classification problem with a large number of categories. However, the large scale of the problem makes it infeasible to use conventional supervised-learning-based text categorization approaches. We propose a large-scale classification framework that leverages multiple types of classification labels to produce a highly accurate classifier with fast training time. We effectively combine the complementary label sources to refine prediction. The experimental results indicate that our framework achieves very high precision and recall and outperforms a Centroid-based method.