Identifying and expanding titles in web texts

  • Authors:
  • Clémentine Adam;Estelle Delpech;Patrick Saint-Dizier

  • Affiliations:
  • IRIT-UPS, Toulouse, France;IRIT-UPS, Toulouse, France;IRIT-CNRS, Toulouse, France

  • Venue:
  • Proceedings of the eighth ACM symposium on Document engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an analysis based on linguistic and typographic features that allows for the identification of titles in web documents. We focus in particular on procedural texts. Identifying titles is a difficult task because ways of encoding them are very diverse. A number of titles are also incomplete because of context, we propose therefore a way to retrieve the missing elements, in particular predicates, so that titles are fully intelligible.