A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Yahoo! as an ontology: using Yahoo! categories to describe documents
Proceedings of the eighth international conference on Information and knowledge management
Bringing order to the Web: automatically categorizing search results
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The VLDB Journal — The International Journal on Very Large Data Bases
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Hierarchical document categorization with support vector machines
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Improving web search results using affinity graph
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Support vector machines classification with a very large-scale taxonomy
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Robust classification of rare queries using web knowledge
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A semantic approach to contextual advertising
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A class-feature-centroid classifier for text categorization
Proceedings of the 18th international conference on World wide web
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Topical categorization of search results based on a domain ontology
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Utilizing global and path information with language modelling for hierarchical text classification
Journal of Information Science
Hi-index | 0.00 |
Compared to traditional text classification with a flat category set or a small hierarchy of categories, classifying web pages to a large-scale hierarchy such as Open Directory Project (ODP) and Yahoo! Directory is challenging. While a recently proposed "deep" classification method makes the problem tractable, it still suffers from low classification performance. A major problem is the lack of training data, which is unavoidable with such a huge hierarchy. Training pages associated with the category nodes are short, and their distributions are skewed. To alleviate the problem, we propose a new training data selection strategy and a naïve Bayes combination model, which utilize both local and global information. We conducted a series of experiments with the ODP hierarchy containing more than 100,000 categories to show that the proposed method of using both local and global information indeed helps avoiding the training data sparseness problem, outperforming the state-of-art method.