A support vector machine approach to dutch part-of-speech tagging

Authors:
Mannes Poel;Luite Stegeman;Rieks op Den Akker
Affiliations:
Human Media Interaction, Dept. Computer Science, University of Twente, Enschede, The Netherlands;Human Media Interaction, Dept. Computer Science, University of Twente, Enschede, The Netherlands;Human Media Interaction, Dept. Computer Science, University of Twente, Enschede, The Netherlands
Venue:
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Year:
2007

Citing 5
Cited 1

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Support-Vector Networks

Machine Learning
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Exploring Features and Classifiers for Dialogue Act Segmentation

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a large corpus of annotated transcriptions of spoken Dutch. Special attention is paid to the decomposition of the large data set into parts for common, uncommon and unknown words. This does not only solve the space problems caused by the amount of data, it also improves the tagging time. The performance of the resulting tagger in terms of accuracy is 97.54 %, which is quite good, where the speed of the tagger is reasonably good.