Discovering "title-like" terms

Authors:
Carly W. Y. Wong;Robert W. P. Luk;Edwards K. S. Ho
Affiliations:
Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Venue:
Information Processing and Management: an International Journal
Year:
2005

Citing 21
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Deriving concept hierarchies from text

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
OCELOT: a system for summarizing Web pages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Multiple related document summary and navigation using concept hierarchies for mobile clients

Proceedings of the 2002 ACM symposium on Applied computing
Title language model for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments on data fusion using headline information

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic extraction of document keyphrases for use in digital libraries: evaluation and applications

Journal of the American Society for Information Science and Technology
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Seek, and Ye Shall Find

IEEE Internet Computing
The Challenges of Automatic Summarization

Computer
Efficiently computed lexical chains as an intermediate representation for automatic text summarization

Computational Linguistics - Summarization
Squibs and discussions: human variation and lexical choice

Computational Linguistics - Summarization
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Identifying topics by position

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improving text categorization using the importance of sentences

Information Processing and Management: an International Journal
Headline generation based on statistical translation

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Detection of language (model) errors

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the feasibility of discovering "title-like" terms using a decision tree classifier from the document. The premise of discovering title-like terms is that title terms and title-like terms should behave similarly in the document. This behavior is characterized by a set of distributional and linguistic features. By training the classifier to observe the behavior of title terms in a balanced manner using 25,000 titles in Reuters articles, other terms with similar behavior would also be discovered. Based on 5000 unseen titles, the recall of title terms was 83%, similar to the manual identification of title terms. The precision of finding title terms is low (i.e., 32%) because some non-title but title-like terms should have been identified as well. Seven subjects were asked to rate, on a scale of between 1 and 5, whether the identified term is a topical/thematic/title term. If a rating of 2.5 is used to determine whether a term is judged to be a "title-like" term, then the mean precision is increased to 58%, or the headline/title is expanded with twice the average number of terms. Since this precision (i.e., 58%) is similar to the mean precision of manually identified title terms averaged across different subjects, we conclude that the discovery of title-like terms using classifiers is a promising approach.