TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Hi-index | 0.00 |
In this paper, we present an analysis based on linguistic and typographic features that allows for the identification of titles in web documents. We focus in particular on procedural texts. Identifying titles is a difficult task because ways of encoding them are very diverse. A number of titles are also incomplete because of context, we propose therefore a way to retrieve the missing elements, in particular predicates, so that titles are fully intelligible.