Restricted representation of phrase structure grammar for building a tree annotated corpus of Korean

Authors:
Kong Joo Lee;Gil Chang Kim;Jae-Hoon Kim;Young S. Han
Affiliations:
Korea Advanced Institute of Science and Technology, Taejon, Korea;Korea Advanced Institute of Science and Technology, Taejon, Korea;Electronics and Telecommunications Research Institute, Taejon, Korea;Suwon University, Suwon, Korea
Venue:
Natural Language Engineering
Year:
1997

Citing 5
Cited 4

Deducing linguistic structure from the statistics of large corpora

HLT '90 Proceedings of the workshop on Speech and Natural Language
Statistical Language Learning

Statistical Language Learning
Introduction to the special issue on computational linguistics using large corpora

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
An annotation scheme for free word order languages

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Evaluating two methods for Treebank grammar compaction

Natural Language Engineering
Backoff model training using partially observed data: application to dialog act tagging

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic partial parsing rule acquisition using decision tree induction

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Sentence compression learned by news headline for displaying in small device

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce a method to represent phrase structure grammars for building a large annotated corpus of Korean syntactic trees. Korean is different from English in word order and word compositions. As a result of our study, it turned out that the differences are significant enough to induce meaningful changes in the tree annotation scheme for Korean with respect to the schemes for English. A tree annotation scheme defines the grammar formalism to be assumed, categories to be used, and rules to determine correct parses for unsettled issues in parse construction. Korean is partially free in word order and the essential components such as subjects and objects of a sentence can be omitted with greater freedom than in English. We propose a restricted representation of phrase structure grammar to handle the characteristics of Korean more efficiently. The proposed representation is shown by means of an extensive experiment to gain improvements in parsing time as well as grammar size. We also describe the system named Teb that is a software environment set up with a goal to build a tree annotated corpus of Korean containing more than one million units.