Pyramidal Digest: An Efficient Model for Abstracting Text Databases

  • Authors:
  • Wesley T. Chuang;Douglas Stott Parker, Jr.

  • Affiliations:
  • -;-

  • Venue:
  • DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel model of automated composite text digest, the Pyramidal Digest. The model integrates traditional text summarization and text classification in that the digest not only serves as a "summary" but is also able to classify text segments of any given size, and answer queries relative to a context. "Pyramidal" refers to the fact that the digest is created in at least three dimensions: scope, granularity, and scale. The Pyramidal Digest is defined recursively as a structure of extracted and abstracted features that are obtained gradually -- from specific to general, and from large to small text segment size -- through a combination of shallow parsing and machine learning algorithms. There are three noticeable threads of learning taking place: learning of characteristic relations, rhetorical relations, and lexical relations. Our model provides a principle for efficiently digesting large quantities of text: progressive learning can digest text by abstracting its significant features. This approach scales, with complexity bounded by O(n log n), where n is the size of the text. It offers a standard and systematic way of collecting as many semantic features as possible that are reachable by shallow parsing. It enables readers to query beyond keyword matches.