C4.5: programs for machine learning
C4.5: programs for machine learning
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
Cross-lingual relevance models
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Machine Learning
Automatic annotation of bibliographical references with target language
MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
A unified relevance model for opinion retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Cross-lingual latent topic extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
ROLEX-SP: Rules of lexical syntactic patterns for free text categorization
Knowledge-Based Systems
Hi-index | 0.00 |
The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.