Boosting multi-label hierarchical text categorization

Authors:
Andrea Esuli;Tiziano Fagni;Fabrizio Sebastiani
Affiliations:
Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy 56124;Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy 56124;Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy 56124
Venue:
Information Retrieval
Year:
2008

Citing 24
Cited 15

Representation and learning in information retrieval

Representation and learning in information retrieval
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Boosting and Rocchio applied to text filtering

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An improved boosting algorithm and its application to text categorization

Proceedings of the ninth international conference on Information and knowledge management
Text classification in a hierarchical mixture model for small training sets

Proceedings of the tenth international conference on Information and knowledge management
Exploiting Hierarchy in Text Categorization

Information Retrieval
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections

Journal of Intelligent Information Systems
A Hierarchical Model for Clustering and Categorising Documents

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Hierarchical Text Classification and Evaluation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
A scalability analysis of classifiers in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
A pitfall and solution in multi-class feature selection for text classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Support vector machines classification with a very large-scale taxonomy

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Classifying web documents in a hierarchy of categories: a comprehensive study

Journal of Intelligent Information Systems

Automated Classification and Categorization of Mathematical Knowledge

Proceedings of the 9th AISC international conference, the 15th Calculemas symposium, and the 7th international MKM conference on Intelligent Computer Mathematics
A survey of hierarchical classification across different application domains

Data Mining and Knowledge Discovery
An improved K-nearest-neighbor algorithm for text categorization

Expert Systems with Applications: An International Journal
Combining Bayesian Text Classification and Shrinkage to Automate Healthcare Coding: A Data Quality Analysis

Journal of Data and Information Quality (JDIQ)
Multi-task drug bioactivity classification with graph labeling ensembles

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
A Bayesian integration model for improved gene functional inference from heterogeneous data sources

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Exploiting concept clumping for efficient incremental news article categorization

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
A genetic algorithm for Hierarchical Multi-Label Classification

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Exploiting label dependency for hierarchical multi-label classification

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Metadata enrichment services for the europeana digital library

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Filter approach feature selection methods to support multi-label learning based on relieff and information gain

SBIA'12 Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence
Variable-constraint classification and quantification of radiology reports under the ACR Index

Expert Systems with Applications: An International Journal
A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach

Electronic Notes in Theoretical Computer Science (ENTCS)
Learning regular expressions to template-based FAQ retrieval systems

Knowledge-Based Systems
Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hierarchical Text Categorization (HTC) is the task of generating (usually by means of supervised learning algorithms) text classifiers that operate on hierarchically structured classification schemes. Notwithstanding the fact that most large-sized classification schemes for text have a hierarchical structure, so far the attention of text classification researchers has mostly focused on algorithms for "flat" classification, i.e. algorithms that operate on non-hierarchical classification schemes. These algorithms, once applied to a hierarchical classification problem, are not capable of taking advantage of the information inherent in the class hierarchy, and may thus be suboptimal, in terms of efficiency and/or effectiveness. In this paper we propose TreeBoost.MH, a multi-label HTC algorithm consisting of a hierarchical variant of AdaBoost.MH, a very well-known member of the family of "boosting" learning algorithms. TreeBoost.MH embodies several intuitions that had arisen before within HTC: e.g. the intuitions that both feature selection and the selection of negative training examples should be performed "locally", i.e. by paying attention to the topology of the classification scheme. It also embodies the novel intuition that the weight distribution that boosting algorithms update at every boosting round should likewise be updated "locally". All these intuitions are embodied within TreeBoost.MH in an elegant and simple way, i.e. by defining TreeBoost.MH as a recursive algorithm that uses AdaBoost.MH as its base step, and that recurs over the tree structure. We present the results of experimenting TreeBoost.MH on three HTC benchmarks, and discuss analytically its computational cost.