Turkish constituent chunking with morphological and contextual features

Authors:
İlknur Durgar El-Kahlout;Ahmet Afşın Akın
Affiliations:
TÜBİTAK-BİLGEM, Gebze, KOCAELİ, Turkey;TÜBİTAK-BİLGEM, Gebze, KOCAELİ, Turkey
Venue:
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Year:
2013

Citing 18
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Text chunking based on a generalization of winnow

The Journal of Machine Learning Research
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Representing text chunks

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
A memory-based approach to learning shallow natural language patterns

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Error-driven pruning of Treebank grammars for base noun phrase identification

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Text chunking by combining hand-crafted rules and memory-based learning

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Dependency Parsing with an Extended Finite-State Approach

Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Noun phrase chunking in Hebrew: influence of lexical and morphological features

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Dependency parsing of turkish

Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Verbs are where all the action lies: experiences of shallow parsing of a morphologically rich language

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Chunking using conditional random fields in korean texts

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art phrase chunking focuses on English and shows high accuracy with very basic word features such as the word itself and the POS tag. In case of morphologically rich languages like Turkish, basic features are not sufficient. Moreover, phrase chunking may not be appropriate and the "chunk" term should be redefined for these languages. In this paper, the first study on Turkish constituent chunking using two different methods is presented. In the first method, we directly extracted chunks from the results of the Turkish dependency parser. In the second method, we experimented with a CRF-based chunker enhanced with morphological and contextual features using the annotated sentences from the Turkish dependency treebank. The experiments showed that the CRF-based chunking augmented with extra features outperforms the baseline chunker with basic features and dependency parser-based chunker. Overall, we developed a CRF-based Turkish chunker with an F-measure of 91.95 for verb chunks and 87.50 for general chunks.