Investigating content selection for language generation using machine learning

Authors:
Colin Kelly;Ann Copestake;Nikiforos Karamanis
Affiliations:
University of Cambridge, Cambridge, UK;University of Cambridge, Cambridge, UK;Trinity College Dublin, Ireland
Venue:
ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Year:
2009

Citing 6
Cited 6

Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation

Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation
Building natural language generation systems

Building natural language generation systems
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Statistical acquisition of content selection rules for natural language generation

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Collective content selection for concept-to-text generation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

FootbOWL: using a generic ontology of football competition for planning match summaries

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Content selection from an ontology-based knowledge base for the generation of football summaries

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Detecting interesting event sequences for sports reporting

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
Perspective-oriented generation of football match summaries: Old tasks, new challenges

ACM Transactions on Speech and Language Processing (TSLP)
Content selection from semantic web data

INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
Generating natural language descriptions from OWL ontologies: the natural OWL system

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The content selection component of a natural language generation system decides which information should be communicated in its output. We use information from reports on the game of cricket. We first describe a simple factoid-to-text alignment algorithm then treat content selection as a collective classification problem and demonstrate that simple 'grouping' of statistics at various levels of granularity yields substantially improved results over a probabilistic baseline. We additionally show that holding back of specific types of input data, and linking database structures with commonality further increase performance.