Classifying texts using relevancy signatures

Authors:
Ellen Riloff;Wendy Lehnert
Affiliations:
Department of Computer Science, University of Massachusetts, Amherst, MA;Department of Computer Science, University of Massachusetts, Amherst, MA
Venue:
AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Year:
1992

Citing 4
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
An evaluation of text analysis technologies

AI Magazine
The CIRCUS System as Used in MUC-3

The CIRCUS System as Used in MUC-3
GE NLTooLSET: MUC-3 test results and analysis

MUC3 '91 Proceedings of the 3rd conference on Message understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text processing for complex domains such as terrorism is complicated by the difficulty of being able to reliably distinguish relevant and irrelevant texts. We have discovered a simple and effective filter, the Relevancy Signatures Algorithm, and demonstrated its performance in the domain of terrorist event descriptions. The Relevancy Signatures Algorithm is based on the natural language processing technique of selective concept extraction, and relies on text representations that reflect predictable patterns of linguistic context This paper describes text classification experiments conducted in the domain of terrorism using the MUC- 3 text corpus. A customized dictionary of about 6,000 words provides the lexical knowledge base needed to discriminate relevant texts, and the CIRCUS sentence analyzer generates relevancy signatures as an effortless side-effect of its normal sentence analysis. Although we suspect that the training base available to us from the MUC-3 corpus may not be large enough to provide optimal training, we were nevertheless able to attain relevancy discriminations for significant levels of recall (ranging from 11% to 47%) with 100% precision in half of our test runs.