OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Using a generalized instance set for automatic text categorization
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Variations in relevance judgments and the measurement of retrieval effectiveness
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A new family of online algorithms for category ranking
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Kernel-Adatron Algorithm: A Fast and Simple Learning Procedure for Support Vector Machines
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A scalability analysis of classifiers in text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Experiment with a hierarchical text categorization method on the WIPO-alpha patent collection
ISUMA '03 Proceedings of the 4th International Symposium on Uncertainty Modelling and Analysis
The Journal of Machine Learning Research
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Comparing and aggregating rankings with ties
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Preference Model for Structured Supervised Learning Tasks
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
SIAM Journal on Discrete Mathematics
Support Vector Ordinal Regression
Neural Computation
Step Size Adaptation in Reproducing Kernel Hilbert Space
The Journal of Machine Learning Research
Kernel-Based Learning of Hierarchical Multilabel Classification Models
The Journal of Machine Learning Research
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting known taxonomies in learning overlapping concepts
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary categories of a document, which represent the topics that are central to it, and its secondary categories, which represent topics that the document only touches upon. We contend that this distinction, so far neglected in text categorization research, is important and deserves to be explicitly tackled. The contribution of this paper is threefold. First, we propose an evaluation measure for this preferential text categorization task, whereby different kinds of misclassifications involving either primary or secondary categories have a different impact on effectiveness. Second, we establish several baseline results for this task on a well-known benchmark for patent classification in which the distinction between primary and secondary categories is present; these results are obtained by reformulating the preferential text categorization task in terms of well established classification problems, such as single and/or multi-label multiclass classification; state-of-the-art learning technology such as SVMs and kernel-based methods are used. Third, we improve on these results by using a recently proposed class of algorithms explicitly devised for learning from training data expressed in preferential form, i.e., in the form "for document d i , category c驴 is preferred to category c驴驴"; this allows us to distinguish between primary and secondary categories not only in the classification phase but also in the learning phase, thus differentiating their impact on the classifiers to be generated.