A Practical Chunker for Unrestricted Text

Authors:
E. Stamatatos;Nikos Fakotakis;George K. Kokkinakis
Affiliations:
-;-;-
Venue:
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Year:
2000

Citing 7
Cited 3

Automatic stochastic tagging of natural language texts

Computational Linguistics
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text

Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Automatic rule induction for unknown-word guessing

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Automatic authorship attribution

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Surface grammatical analysis for the extraction of terminological noun phrases

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
The message understanding conferences

TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996

Empirical Paraphrasing of Modern Greek Text in Two Phases: An Application to Steganography

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Extracting shallow paraphrasing schemata from modern Greek text using statistical significance testing and supervised learning

ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Experimental evaluation of tree-based algorithms for intonational breaks representation

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a practical approach to text chunking for unrestricted Modern Greek text that is based on multiple-pass parsing. Two versions of this chunker are proposed: one based on a large lexicon and one based on minimal resources. In the latter case the morphological analysis is performed using exclusively two small lexicons containing closed-class words and common suffixes of the Modern Greek words. We give comparative performance results on the basis of a corpus of unrestricted text and show that very good results can be obtained by omitting the large and complicate resources. Moreover, the considerable time cost introduced by the use of the large lexicon indicates that the minimal-resources chunker is the best solution regarding a practical application that requires rapid response and less than perfect parsing results.