Building a large Chinese corpus annotated with semantic dependency

Authors:
Li Mingqin;Li Juanzi;Dong Zhendong;Wang Zuoying;Lu Dajin
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Chinese Academy of Sciences, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Year:
2003

Citing 4
Cited 5

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A Chinese corpus for linguistic research

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Dependency-based syntactic analysis of Chinese and annotation of parsed corpus

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Chinese semantic dependency analysis: Construction of a treebank and its use in classification

ACM Transactions on Speech and Language Processing (TSLP)
Integration of Multiple Classifiers for Chinese Semantic Dependency Analysis

Electronic Notes in Theoretical Computer Science (ENTCS)
A chinese corpus with word sense annotation

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
SemEval-2012 task 5: Chinese semantic dependency parsing

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

At present most of corpora are annotated mainly with syntactic knowledge. In this paper, we attempt to build a large corpus and annotate semantic knowledge with dependency grammar. We believe that words are the basic units of semantics, and the structure and meaning of a sentence consist mainly of a series of semantic dependencies between individual words. A 1,000,000-word-scale corpus annotated with semantic dependency has been built. Compared with syntactic knowledge, semantic knowledge is more difficult to annotate, for ambiguity problem is more serious. In the paper, the strategy to improve consistency is addressed, and congruence is defined to measure the consistency of tagged corpus.. Finally, we will compare our corpus with other well-known corpora.