Pyramidal Digest: An Efficient Model for Abstracting Text Databases

Authors:
Wesley T. Chuang;Douglas Stott Parker, Jr.
Affiliations:
-;-
Venue:
DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Year:
2001

Citing 9
Cited 0

Word sense disambiguation and information retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Extracting sentence segments for text summarization: a machine learning approach

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Fast Algorithm for Hierarchical Text Classification

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
The automatic creation of literature abstracts

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel model of automated composite text digest, the Pyramidal Digest. The model integrates traditional text summarization and text classification in that the digest not only serves as a "summary" but is also able to classify text segments of any given size, and answer queries relative to a context. "Pyramidal" refers to the fact that the digest is created in at least three dimensions: scope, granularity, and scale. The Pyramidal Digest is defined recursively as a structure of extracted and abstracted features that are obtained gradually -- from specific to general, and from large to small text segment size -- through a combination of shallow parsing and machine learning algorithms. There are three noticeable threads of learning taking place: learning of characteristic relations, rhetorical relations, and lexical relations. Our model provides a principle for efficiently digesting large quantities of text: progressive learning can digest text by abstracting its significant features. This approach scales, with complexity bounded by O(n log n), where n is the size of the text. It offers a standard and systematic way of collecting as many semantic features as possible that are reachable by shallow parsing. It enables readers to query beyond keyword matches.