On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
On the convergence of the coordinate descent method for convex differentiable minimization
Journal of Optimization Theory and Applications
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Convergence of a block coordinate descent method for nondifferentiable minimization
Journal of Optimization Theory and Applications
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Regularized multi--task learning
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Large margin hierarchical classification
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Linear prediction models with graph regularization for web-page categorization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Algorithms for Hierarchical Classification
The Journal of Machine Learning Research
Kernel-Based Learning of Hierarchical Multilabel Classification Models
The Journal of Machine Learning Research
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM
Proceedings of the 25th international conference on Machine learning
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Convex multi-task feature learning
Machine Learning
Refined experts: improving classification in large taxonomies
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Multilabel classification with meta-level features
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Leveraging sequence classification by taxonomy-based multitask learning
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Hi-index | 0.00 |
The two key challenges in hierarchical classification are to leverage the hierarchical dependencies between the class-labels for improving performance, and, at the same time maintaining scalability across large hierarchies. In this paper we propose a regularization framework for large-scale hierarchical classification that addresses both the problems. Specifically, we incorporate the hierarchical dependencies between the class-labels into the regularization structure of the parameters thereby encouraging classes nearby in the hierarchy to share similar model parameters. Furthermore, we extend our approach to scenarios where the dependencies between the class-labels are encoded in the form of a graph rather than a hierarchy. To enable large-scale training, we develop a parallel-iterative optimization scheme that can handle datasets with hundreds of thousands of classes and millions of instances and learning terabytes of parameters. Our experiments showed a consistent improvement over other competing approaches and achieved state-of-the-art results on benchmark datasets.