Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Deriving concept hierarchies from text
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
OCELOT: a system for summarizing Web pages
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Multiple related document summary and navigation using concept hierarchies for mobile clients
Proceedings of the 2002 ACM symposium on Applied computing
Title language model for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments on data fusion using headline information
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science and Technology
Learning Algorithms for Keyphrase Extraction
Information Retrieval
IEEE Internet Computing
Computational Linguistics - Summarization
Squibs and discussions: human variation and lexical choice
Computational Linguistics - Summarization
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Domain-Specific Keyphrase Extraction
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Identifying topics by position
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improving text categorization using the importance of sentences
Information Processing and Management: an International Journal
Headline generation based on statistical translation
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Detection of language (model) errors
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Improved use of continuous attributes in C4.5
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
This paper examines the feasibility of discovering "title-like" terms using a decision tree classifier from the document. The premise of discovering title-like terms is that title terms and title-like terms should behave similarly in the document. This behavior is characterized by a set of distributional and linguistic features. By training the classifier to observe the behavior of title terms in a balanced manner using 25,000 titles in Reuters articles, other terms with similar behavior would also be discovered. Based on 5000 unseen titles, the recall of title terms was 83%, similar to the manual identification of title terms. The precision of finding title terms is low (i.e., 32%) because some non-title but title-like terms should have been identified as well. Seven subjects were asked to rate, on a scale of between 1 and 5, whether the identified term is a topical/thematic/title term. If a rating of 2.5 is used to determine whether a term is judged to be a "title-like" term, then the mean precision is increased to 58%, or the headline/title is expanded with twice the average number of terms. Since this precision (i.e., 58%) is similar to the mean precision of manually identified title terms averaged across different subjects, we conclude that the discovery of title-like terms using classifiers is a promising approach.