Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text classification in a hierarchical mixture model for small training sets
Proceedings of the tenth international conference on Information and knowledge management
A Hierarchical Model for Clustering and Categorising Documents
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The Journal of Machine Learning Research
Document classification through interactive supervision of document and term labels
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Hierarchical Dirichlet model for document classification
ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatic expansion of domain-specific lexicons by term categorization
ACM Transactions on Speech and Language Processing (TSLP)
Constructing informative prior distributions from domain knowledge in text classification
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Web-based text classification in the absence of manually labeled training documents
Journal of the American Society for Information Science and Technology
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Mixtures of hierarchical topics with Pachinko allocation
Proceedings of the 24th international conference on Machine learning
An unsupervised hierarchical approach to document categorization
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Knowledge Supervised Text Classification with No Labeled Documents
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Improving text categorization bootstrapping via unsupervised learning
ACM Transactions on Speech and Language Processing (TSLP)
Towards a Universal Text Classifier: Transfer Learning Using Encyclopedic Knowledge
ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Semi-supervised document classification with a mislabeling error model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
News Event Modeling and Tracking in the Social Web with Ontological Guidance
ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
POWDER and the multi million-triple store
Proceedings of the International Workshop on Semantic Web Information Management
Classifying unlabeled short texts using a fuzzy declarative approach
Language Resources and Evaluation
Structured summarization for news events
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
The traditional machine learning approaches for text classification often require labelled data for learning classifiers. However, when applied to large-scale classification involving thousands of categories, creating such labelled data is extremely expensive since typically the data is manually labelled by humans. Motivated by this, we propose a novel approach for large-scale hierarchical text classification which does not require any labelled data. We explore a perspective where the meaning of a category is not defined by human-labelled documents, but by its description and more importantly its relationships with other categories (e.g. its ascendants and descendants). Specifically, we take advantage of the ontological knowledge in all phases of the whole process, namely when retrieving pseudo-labelled documents, when iteratively training the category models and when categorizing test documents. Our experiments based on a taxonomy containing 1131 categories and widely adopted in the news industry as a standard for the NewsML framework demonstrate the effectiveness of our approach in these phases both qualitatively and quantitatively. In particular, we emphasize that just by taking the simple ontological knowledge defined in the category hierarchy, we could automatically build a large-scale hierarchical classifier with reasonable performance of 67% in terms of the hierarchy-based F-1 measure.