Summarization of noisy documents: a pilot study

  • Authors:
  • Hongyan Jing;Daniel Lopresti;Chilin Shih

  • Affiliations:
  • IBM T.J. Watson Research Center, Yorktown Heights, NY;Hopewell, NJ;Berkeley Heights, NJ

  • Venue:
  • HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to deal with clean text, suffer significant degradation even with slight increases in the noise level of a document. We conclude by proposing possible ways of improving the performance of noisy document summarization.