Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Modern Information Retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Improving Text Classification by Shrinkage in a Hierarchy of Classes
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Building Hierarchical Classifiers Using Class Proximity
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Journal of Machine Learning Research
Liveclassifier: creating hierarchical text classifiers through web corpora
Proceedings of the 13th international conference on World Wide Web
Web taxonomy integration using support vector machines
Proceedings of the 13th international conference on World Wide Web
Dependent Dirichlet priors and optimal linear estimators for belief net parameters
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Probabilistic classification and clustering in relational data
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Hierarchical classification of HTML documents with WebClassII
ECIR'03 Proceedings of the 25th European conference on IR research
Organizing the OCA: learning faceted subjects from a library of digital books
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Topic taxonomy adaptation for group profiling
ACM Transactions on Knowledge Discovery from Data (TKDD)
Building Quality-Based Views of the Web
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Topic model methods for automatically identifying out-of-scope resources
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Large-scale hierarchical text classification without labelled data
Proceedings of the fourth ACM international conference on Web search and data mining
Journal of Data and Information Quality (JDIQ)
Regularization for unsupervised classification on taxonomies
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Formalizing the get-specific document classification algorithm
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Automatic classification of documents in cold-start scenarios
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Hi-index | 0.00 |
The proliferation of text documents on the web as well as within institutions necessitates their convenient organization to enable efficient retrieval of information. Although text corpora are frequently organized into concept hierarchies or taxonomies, the classification of the documents into the hierarchy is expensive in terms human effort. We present a novel and simple hierarchical Dirichlet generative model for text corpora and derive an efficient algorithm for the estimation of model parameters and the unsupervised classification of text documents into a given hierarchy. The class conditional feature means are assumed to be inter-related due to the hierarchical Bayesian structure of the model. We show that the algorithm provides robust estimates of the classification parameters by performing smoothing or regularization. We present experimental evidence on real web data that our algorithm achieves significant gains in accuracy over simpler models.