OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Multicategory Classification by Support Vector Machines
Computational Optimization and Applications - Special issue on computational optimization—a tribute to Olvi Mangasarian, part I
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Using Error-Correcting Codes for Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Hierarchical document categorization with support vector machines
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Acclimatizing Taxonomic Semantics for Hierarchical Content Classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Reconstructing ddc for interactive classification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Data & Knowledge Engineering
Topic taxonomy adaptation for group profiling
ACM Transactions on Knowledge Discovery from Data (TKDD)
Deep classifier: automatically categorizing search results into large-scale hierarchies
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Sensitive webpage classification for content advertising
Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Boosting multi-label hierarchical text categorization
Information Retrieval
Query dependent ranking using K-nearest neighbor
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
On updates that constrain the features' connections during learning
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Error-driven generalist+experts (edge): a multi-stage ensemble framework for text categorization
Proceedings of the 17th ACM conference on Information and knowledge management
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Large scale multi-label classification via metalabeler
Proceedings of the 18th international conference on World wide web
Active Learning Strategies for Multi-Label Text Classification
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A hierarchical approach to encoding medical concepts for clinical notes
HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Classification Visualization across Mapping on a Sphere
Proceedings of the 2008 conference on New Trends in Multimedia and Network Information Systems
Refined experts: improving classification in large taxonomies
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A New Fuzzy Hierarchical Classification Based on SVM for Text Categorization
ICIAR '09 Proceedings of the 6th International Conference on Image Analysis and Recognition
Novel web page classification techniques in contextual advertising
Proceedings of the eleventh international workshop on Web information and data management
An extensive study on automated Dewey Decimal Classification
Journal of the American Society for Information Science and Technology
Automatic content-based categorization of Wikipedia articles
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
The Journal of Machine Learning Research
Does SVM really scale up to large bag of words feature spaces?
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Combining global and local information for enhanced deep classification
Proceedings of the 2010 ACM Symposium on Applied Computing
Making more wikipedians: facilitating semantics reuse for wikipedia authoring
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
The ECIR 2010 large scale hierarchical classification workshop
ACM SIGIR Forum
Improving Hierarchical Classification with Partial Labels
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Building a dynamic classifier for large text data collections
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Inducing word senses to improve web search result clustering
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Web page classification on child suitability
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Optimizing unified loss for web ranking specialization
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Improving taxonomies for large-scale hierarchical classifiers of web documents
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A combined topical/non-topical approach to identifying web sites for children
Proceedings of the fourth ACM international conference on Web search and data mining
A survey of hierarchical classification across different application domains
Data Mining and Knowledge Discovery
On exploiting hierarchical label structure with pairwise classifiers
ACM SIGKDD Explorations Newsletter
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Hierarchical text classification with latent concepts
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Journal of Data and Information Quality (JDIQ)
Hierarchy evolution for improved classification
Proceedings of the 20th ACM international conference on Information and knowledge management
Heterogeneous information integration in hierarchical text classification
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Dynamically generating context-relevant sub-webs
DESRIST'10 Proceedings of the 5th international conference on Global Perspectives on Design Science Research
A new search engine integrating hierarchical browsing and keyword search
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
EachWiki: Facilitating Wiki Authoring by Annotation Suggestion
ACM Transactions on Intelligent Systems and Technology (TIST)
Hierarchical classification of web documents by stratified discriminant analysis
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Large-scale item categorization for e-commerce
Proceedings of the 21st ACM international conference on Information and knowledge management
On empirical tradeoffs in large scale hierarchical classification
Proceedings of the 21st ACM international conference on Information and knowledge management
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Learning multiple tasks with boosted decision trees
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Adaptive classifier selection in large-scale hierarchical classification
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Variable-constraint classification and quantification of radiology reports under the ACR Index
Expert Systems with Applications: An International Journal
Learning to rank from structures in hierarchical text classification
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Incremental reranking for hierarchical text classification
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Recursive regularization for large-scale classification with hierarchical and graphical dependencies
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Classifying YouTube channels: a practical system
Proceedings of the 22nd international conference on World Wide Web companion
Semantic contextual advertising based on the open directory project
ACM Transactions on the Web (TWEB)
Utilizing global and path information with language modelling for hierarchical text classification
Journal of Information Science
Hi-index | 0.01 |
Very large-scale classification taxonomies typically have hundreds of thousands of categories, deep hierarchies, and skewed category distribution over documents. However, it is still an open question whether the state-of-the-art technologies in automated text categorization can scale to (and perform well on) such large taxonomies. In this paper, we report the first evaluation of Support Vector Machines (SVMs) in web-page classification over the full taxonomy of the Yahoo! categories. Our accomplishments include: 1) a data analysis on the Yahoo! taxonomy; 2) the development of a scalable system for large-scale text categorization; 3) theoretical analysis and experimental evaluation of SVMs in hierarchical and non-hierarchical settings for classification; 4) an investigation of threshold tuning algorithms with respect to time complexity and their effect on the classification accuracy of SVMs. We found that, in terms of scalability, the hierarchical use of SVMs is efficient enough for very large-scale classification; however, in terms of effectiveness, the performance of SVMs over the Yahoo! Directory is still far from satisfactory, which indicates that more substantial investigation is needed.