Support vector machines classification with a very large-scale taxonomy

Authors:
Tie-Yan Liu;Yiming Yang;Hao Wan;Hua-Jun Zeng;Zheng Chen;Wei-Ying Ma
Affiliations:
Microsoft Research Asia, Beijing, P. R. China;Carnegie Mellon University, PA;Tsinghua University, Beijing, P. R. China;Microsoft Research Asia, Beijing, P. R. China;Microsoft Research Asia, Beijing, P. R. China;Microsoft Research Asia, Beijing, P. R. China
Venue:
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Year:
2005

Citing 15
Cited 56

OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Multicategory Classification by Support Vector Machines

Computational Optimization and Applications - Special issue on computational optimization—a tribute to Olvi Mangasarian, part I
Bringing order to the Web: automatically categorizing search results

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Hierarchical Text Classification and Evaluation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Using Error-Correcting Codes for Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management

Acclimatizing Taxonomic Semantics for Hierarchical Content Classification

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Reconstructing ddc for interactive classification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation

Data & Knowledge Engineering
Topic taxonomy adaptation for group profiling

ACM Transactions on Knowledge Discovery from Data (TKDD)
Deep classifier: automatically categorizing search results into large-scale hierarchies

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Sensitive webpage classification for content advertising

Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Boosting multi-label hierarchical text categorization

Information Retrieval
Query dependent ranking using K-nearest neighbor

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Deep classification in large-scale text hierarchies

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On updates that constrain the features' connections during learning

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization

Proceedings of the 17th ACM conference on Information and knowledge management
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Large scale multi-label classification via metalabeler

Proceedings of the 18th international conference on World wide web
Active Learning Strategies for Multi-Label Text Classification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A hierarchical approach to encoding medical concepts for clinical notes

HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Classification Visualization across Mapping on a Sphere

Proceedings of the 2008 conference on New Trends in Multimedia and Network Information Systems
Refined experts: improving classification in large taxonomies

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A New Fuzzy Hierarchical Classification Based on SVM for Text Categorization

ICIAR '09 Proceedings of the 6th International Conference on Image Analysis and Recognition
Novel web page classification techniques in contextual advertising

Proceedings of the eleventh international workshop on Web information and data management
An extensive study on automated Dewey Decimal Classification

Journal of the American Society for Information Science and Technology
Automatic content-based categorization of Wikipedia articles

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Learning When Concepts Abound

The Journal of Machine Learning Research
Does SVM really scale up to large bag of words feature spaces?

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Combining global and local information for enhanced deep classification

Proceedings of the 2010 ACM Symposium on Applied Computing
Making more wikipedians: facilitating semantics reuse for wikipedia authoring

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
The ECIR 2010 large scale hierarchical classification workshop

ACM SIGIR Forum
Improving Hierarchical Classification with Partial Labels

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Building a dynamic classifier for large text data collections

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Web page classification on child suitability

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Optimizing unified loss for web ranking specialization

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improving taxonomies for large-scale hierarchical classifiers of web documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A combined topical/non-topical approach to identifying web sites for children

Proceedings of the fourth ACM international conference on Web search and data mining
A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery
On exploiting hierarchical label structure with pairwise classifiers

ACM SIGKDD Explorations Newsletter
Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Hierarchical text classification with latent concepts

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Combining Bayesian Text Classification and Shrinkage to Automate Healthcare Coding: A Data Quality Analysis

Journal of Data and Information Quality (JDIQ)
Hierarchy evolution for improved classification

Proceedings of the 20th ACM international conference on Information and knowledge management
Heterogeneous information integration in hierarchical text classification

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Dynamically generating context-relevant sub-webs

DESRIST'10 Proceedings of the 5th international conference on Global Perspectives on Design Science Research
A new search engine integrating hierarchical browsing and keyword search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
EachWiki: Facilitating Wiki Authoring by Annotation Suggestion

ACM Transactions on Intelligent Systems and Technology (TIST)
Hierarchical classification of web documents by stratified discriminant analysis

IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Large-scale item categorization for e-commerce

Proceedings of the 21st ACM international conference on Information and knowledge management
On empirical tradeoffs in large scale hierarchical classification

Proceedings of the 21st ACM international conference on Information and knowledge management
Crowd-sourced knowledge bases

PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Learning multiple tasks with boosted decision trees

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Adaptive classifier selection in large-scale hierarchical classification

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Variable-constraint classification and quantification of radiology reports under the ACR Index

Expert Systems with Applications: An International Journal
Learning to rank from structures in hierarchical text classification

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Incremental reranking for hierarchical text classification

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Recursive regularization for large-scale classification with hierarchical and graphical dependencies

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Classifying YouTube channels: a practical system

Proceedings of the 22nd international conference on World Wide Web companion
Semantic contextual advertising based on the open directory project

ACM Transactions on the Web (TWEB)
Utilizing global and path information with language modelling for hierarchical text classification

Journal of Information Science

Quantified Score

Hi-index	0.01

Visualization

Abstract

Very large-scale classification taxonomies typically have hundreds of thousands of categories, deep hierarchies, and skewed category distribution over documents. However, it is still an open question whether the state-of-the-art technologies in automated text categorization can scale to (and perform well on) such large taxonomies. In this paper, we report the first evaluation of Support Vector Machines (SVMs) in web-page classification over the full taxonomy of the Yahoo! categories. Our accomplishments include: 1) a data analysis on the Yahoo! taxonomy; 2) the development of a scalable system for large-scale text categorization; 3) theoretical analysis and experimental evaluation of SVMs in hierarchical and non-hierarchical settings for classification; 4) an investigation of threshold tuning algorithms with respect to time complexity and their effect on the classification accuracy of SVMs. We found that, in terms of scalability, the hierarchical use of SVMs is efficient enough for very large-scale classification; however, in terms of effectiveness, the performance of SVMs over the Yahoo! Directory is still far from satisfactory, which indicates that more substantial investigation is needed.