Cross-language headline generation for Hindi

Authors:
Bonnie Dorr;David Zajic;Richard Schwartz
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD;BBN Technologies, Columbia, MD
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2003

Citing 17
Cited 4

A statistical approach to machine translation

Computational Linguistics
The identification of important concepts in highly structured technical papers

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Statistics-Based Summarization - Step One: Sentence Compression

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
A novel use of statistical parsing to extract information from text

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Exploiting a probabilistic hierarchical model for generation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Making MIRACLEs: Interactive translingual search for Cebuano and Hindi

ACM Transactions on Asian Language Information Processing (TALIP)
Multi-candidate reduction: Sentence compression as a tool for document summarization tasks

Information Processing and Management: an International Journal
Personalized web exploration with task models

Proceedings of the 17th international conference on World Wide Web
Hindi, telugu, oromo, english CLIR evaluation

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents new approaches to headline generation for English newspaper texts, with an eye toward the production of document surrogates for document selection in cross-language information retrieval. This task is difficult because the user must make decisions about relevance based on (often poor) translations of retrieved documents. To facilitate the decision-making process we need translations that can be assessed rapidly and accurately; our approach is to provide an English headline for the non-English document. We describe two approaches to headline generation and their application to the recent DARPA TIDES-2003 Surprise Language Exercise for Hindi. For comparison, we also implemented an alternative method for surrogate generation: a system that produces topic lists for (Hindi) articles. We present the results of a series of experiments comparing each of these approaches. We demonstrate in both automatic and human evaluations that our linguistically motivated approach outperforms two other surrogate-generation methods: a statistical system and a topic discovery system.