Development, implementation and testing of a discourse model for newspaper texts

  • Authors:
  • Elizabeth D. Liddy;Kenneth A. McVearry;Woojin Paik;Edmund Yu;Mary McKenna

  • Affiliations:
  • Syracuse University, Syracuse, NY;Coherent Research, Inc., East Syracuse, NY;Syracuse University, Syracuse, NY;Syracuse University, Syracuse, NY;Syracuse University, Syracuse, NY

  • Venue:
  • HLT '93 Proceedings of the workshop on Human Language Technology
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Texts of a particular type evidence a discernible, predictable schema. These schemata can be delineated, and as such provide models of their respective text-types which are of use in automatically structuring texts. We have developed a Text Structurer module which recognizes text-level structure for use within a larger information retrieval system to delineate the discourse-level organization of each document's contents. This allows those document components which are more likely to contain the type of information suggested by the user's query to be selected for higher weighting. We chose newspaper text as the first text type to implement. Several iterations of manually coding a randomly chosen sample of newspaper articles enabled us to develop a newspaper text model. This process suggested that our intellectual decomposing of texts relied on six types of linguistic information, which were incorporated into the Text Structurer module. Evaluation of the results of the module led to a revision of the underlying text model and of the Text Structurer itself.