Self-training and co-training applied to spanish named entity recognition

Authors:
Zornitsa Kozareva;Boyan Bonev;Andres Montoyo
Affiliations:
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain;Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain
Venue:
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Year:
2005

Citing 5
Cited 5

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Combining data-driven systems for improving named entity recognition

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Combining data-driven systems for improving Named Entity Recognition

Data & Knowledge Engineering
Named entity recognition for Ukrainian: a resource-light approach

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Bootstrapping named entity recognition with automatically generated gazetteer lists

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Combining labeled and unlabeled data with word-class distribution learning

Proceedings of the 18th ACM conference on Information and knowledge management
Training a named entity recognizer on the web

WISE'11 Proceedings of the 12th international conference on Web information system engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper discusses the usage of unlabeled data for Spanish Named Entity Recognition. Two techniques have been used: self-training for detecting the entities in the text and co-training for classifying these already detected entities. We introduce a new co-training algorithm, which applies voting techniques in order to decide which unlabeled example should be added into the training set at each iteration. A proposal for improving the performance of the detected entities has been made. A brief comparative study with already existing co-training algorithms is demonstrated.