Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
A critique and improvement of an evaluation metric for text segmentation
Computational Linguistics
Computational Linguistics
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Discourse segmentation of multi-party conversation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Variation of entropy and parse trees of sentences as a function of the sentence number
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Broad coverage paragraph segmentation across languages and domains
ACM Transactions on Speech and Language Processing (TSLP)
Beyond the pipeline: discrete optimization in NLP
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
A paragraph boundary detection system
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Text segmentation by clustering cohesion
CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Applying collocation segmentation to the ACL anthology reference corpus
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Hi-index | 0.00 |
In this paper we propose a machine-learning approach to paragraph boundary identification which utilizes linguistically motivated features. We investigate the relation between paragraph boundaries and discourse cues, pronominalization and information structure. We test our algorithm on German data and report improvements over three baselines including a reimplementation of Sporleder & Lapata's (2006) work on paragraph segmentation. An analysis of the features' contribution suggests an interpretation of what paragraph boundaries indicate and what they depend on.