Partial parsing: a report on work in progress
HLT '91 Proceedings of the workshop on Speech and Natural Language
Studies in part of speech labelling
HLT '91 Proceedings of the workshop on Speech and Natural Language
Deducing linguistic structure from the statistics of large corpora
HLT '90 Proceedings of the workshop on Speech and Natural Language
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
An annotation scheme for free word order languages
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Discovering the lexical features of a language
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Probabilistic parse scoring based on prosodic phrasing
HLT '91 Proceedings of the workshop on Speech and Natural Language
Tagging Urdu text with parts of speech: a tagger comparison
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Parsing a natural language using mutual information statistics
AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
This work aims at the development of a representative treebank for the South Asian language Urdu. Urdu is a comparatively under resourced language and the development of a reliable treebank for Urdu will have significant impact on the state-of-the-art for Urdu language processing. In URDU.KON-TB treebank described here, a POS tagset, a syntactic tagset and a functional tagset have been proposed. The construction of the treebank is based on an existing corpus of 19 million words for the Urdu language. Part of speech (POS) tagging and annotation of a selected set of sentences from different sub-domains of this corpus is in process manually and the work performed till to date is presented here. The hierarchical annotation scheme we adopted has a combination of a phrase structure (PS) and a hybrid dependency structure (HDS).