Automatic annotation of bibliographical references for descriptive language materials

Authors:
Harald Hammarström
Affiliations:
Max Planck Institute for Evolutionary Anthropology, Department of Linguistics, Leipzig, German
Venue:
CLEF'11 Proceedings of the Second international conference on Multilingual and multimodal information access evaluation
Year:
2011

Citing 10
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The CN2 Induction Algorithm

Machine Learning
Induction of Decision Trees

Machine Learning
Automatic annotation of bibliographical references with target language

MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
A unified relevance model for opinion retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Cross-lingual latent topic extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
ROLEX-SP: Rules of lexical syntactic patterns for free text categorization

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The present paper considers the problem of annotating bibliographical references with labels/classes, given training data of references already annotated with labels. The problem is an instance of document categorization where the documents are short and written in a wide variety of languages. The skewed distributions of title words and labels calls for special carefulness when choosing a Machine Learning approach. The present paper describes how to induce Disjunctive Normal Form formulae (DNFs), which have several advantages over Decision Trees. The approach is evaluated on a large real-world collection of bibliographical references.