Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text
Summarization beyond sentence extraction: a probabilistic approach to sentence compression
Artificial Intelligence
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Sentence reduction for automatic text summarization
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Automatic summarization of English broadcast news speech
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Hi-index | 0.00 |
We present in this paper a sentence compression module used in a machine-assisted subtitling application developed in the European e-content project e-title. Our approach to compression and the architecture of the system are motivated by the commercial and multilingual nature of the project, that is, the need to output reasonable compressions and the ability to add new strategies, genres and languages easily. The compression module currently works for the Catalan and English languages and uses the Constraint Grammar engine for linguistic preprocessing and for the linguistically motivated compression rules, thus providing a homogenous format throughout the compression process. The compression rules were implemented based on a corpus of automatically aligned pairs of films for both languages. We performed for both languages an automatic quantitative evaluation of the compression using the aligned corpus and a qualitative manual evaluation of grammaticality and informativeness.