Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Exploiting Hierarchy in Text Categorization
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Finding a Maximum Density Subgraph
Finding a Maximum Density Subgraph
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
Large margin hierarchical classification
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
A Study of Smoothing Algorithms for Item Categorization on e-Commerce Sites
ICMLA '10 Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications
Item categorization in the e-commerce domain
Proceedings of the 20th ACM international conference on Information and knowledge management
Cost-sensitive learning for large-scale hierarchical classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On segmentation of eCommerce queries
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
This paper studies the problem of leveraging computationally intensive classification algorithms for large scale text categorization problems. We propose a hierarchical approach which decomposes the classification problem into a coarse level task and a fine level task. A simple yet scalable classifier is applied to perform the coarse level classification while a more sophisticated model is used to separate classes at the fine level. However, instead of relying on a human-defined hierarchy to decompose the problem, we we use a graph algorithm to discover automatically groups of highly similar classes. As an illustrative example, we apply our approach to real-world industrial data from eBay, a major e-commerce site where the goal is to classify live items into a large taxonomy of categories. In such industrial setting, classification is very challenging due to the number of classes, the amount of training data, the size of the feature space and the real-world requirements on the response time. We demonstrate through extensive experimental evaluation that (1) the proposed hierarchical approach is superior to flat models, and (2) the data-driven extraction of latent groups works significantly better than the existing human-defined hierarchy.