Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A trainable document summarizer
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Summarization beyond sentence extraction: a probabilistic approach to sentence compression
Artificial Intelligence
Title Generation Using a Training Corpus
CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Using the web to obtain frequencies for unseen bigrams
Computational Linguistics - Special issue on web as corpus
Headline generation based on statistical translation
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Title extraction from bodies of HTML documents and its application to web page retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Hedge Trimmer: a parse-and-trim approach to headline generation
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Automatic extraction of titles from general documents using machine learning
Information Processing and Management: an International Journal
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Web page title extraction and its application
Information Processing and Management: an International Journal
Generating succinct titles for web URLs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
The effect of title term suggestion on e-commerce sites
Proceedings of the 10th ACM workshop on Web information and data management
A General Learning Method for Automatic Title Extraction from HTML Pages
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Machine-made index for technical literature: an experiment
IBM Journal of Research and Development
Automatic text summarization based on word-clusters and ranking algorithms
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Text segmentation based on document understanding for information retrieval
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Hi-index | 12.05 |
Automatic titling of text documents is an essential task for several applications (automatic heading of e-mails, summarization, and so forth). This paper describes a system facilitating information retrieval in a set of textual documents by tackling the automatic titling and subtitling issue. Automatic titling here involves providing both informative and catchy titles. We thus propose two different approaches based on NLP, text mining, and Web Mining techniques. The first one (POSTIT) consists of extracting relevant noun phrases from texts as candidate titles. An original approach combining statistical criteria and noun phrase positions in the text helps in collecting informative titles and subtitles. The second approach (NOMIT) is based on various assumptions made on POSTIT and aims to generate both informative and catchy titles. Both approaches are applied to a corpus of news articles, then evaluated according to two criteria, i.e. informativeness and catchiness.