Title Generation Using a Training Corpus

Authors:
Rong Jin;Alexander G. Hauptmann
Affiliations:
-;-
Venue:
CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2001

Citing 8
Cited 1

An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text structuring and summarization

Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Selecting text spans for document summaries: heuristics and metrics

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Information Retrieval

Information Retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

How can catchy titles be generated without loss of informativeness?

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses fundamental issues involved in word selection for title generation. We review several methods for title generation, namely extractive summarization and two versions of a Naïve Bayesian, and compare the performance of those methods using an F1 metric. In addition, we introduce a novel approach to title generation using the k-nearest neighbor (KNN) algorithm. Both the KNN method and a limited-vocabulary Naïve Bayesian method outperform the other evaluated methods with an F1 score of around 20%. Since KNN produces complete and legible titles, we conclude that KNN is a very promising method for title generation, provided good content overlap exists between the training corpus and the test documents.