Procedure for quantitatively comparing the syntactic coverage of English grammars
HLT '91 Proceedings of the workshop on Speech and Natural Language
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Natural Language Engineering
A Linguistically Inspired Statistical Model for Chinese Punctuation Generation
ACM Transactions on Asian Language Information Processing (TALIP)
Better punctuation prediction with dynamic conditional random fields
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Chinese comma disambiguation for discourse analysis
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Elementary discourse unit in chinese discourse structure analysis
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
A chinese sentence segmentation approach based on comma
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
A clause-level hybrid approach to Chinese empty element recovery
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
We describe a method for disambiguating Chinese commas that is central to Chinese sentence segmentation. Chinese sentence segmentation is viewed as the detection of loosely coordinated clauses separated by commas. Trained and tested on data derived from the Chinese Treebank, our model achieves a classification accuracy of close to 90% overall, which translates to an F1 score of 70% for detecting commas that signal sentence boundaries.