Natural language generation for sponsored-search advertisements

Authors:
Kevin Bartz;Cory Barr;Adil Aijaz
Affiliations:
Department of Statistics, Harvard University, Cambridge, MA, USA;Yahoo!, Inc., Burbank, CA, USA;Yahoo!, Inc., Burbank, CA, USA
Venue:
Proceedings of the 9th ACM conference on Electronic commerce
Year:
2008

Citing 5
Cited 2

Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
Word reordering and a dynamic programming beam search algorithm for statistical machine translation

Computational Linguistics
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
Learning features that predict cue usage

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Using machine learning techniques to build a comma checker for Basque

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions

The linguistic structure of English web-search queries

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automated snippet generation for online advertising

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In sponsored search, advertisers bid on phrases representative of offered products or services. For large advertisers, these phrases often come from quasi-algorithmically generated lists of thousands of terms prone to poor linguistic construction. A bidded term by itself is usually unsuitable for direct insertion into an ad copy template; it must be rephrased and capitalized properly to fit the template, possibly with additional language to avoid semantic ambiguity. We develop a natural language generation system to automate these steps, preparing a list of terms for insertion into an ad template. For each input term, our system first finds a proper word ordering by mining a corpus of Web search query logs. Next it determines whether the term is ambiguous and--if semantics dictate--attaches a clarifying modifier culled from query logs. Finally, it applies proper capitalization by analyzing pages from Web search engine results. Each step yields a plausible set of displayable forms from which a machine-learned model selects the best. The models are trained and tested on a large set of human-labeled data. The overall system significantly outperforms baseline systems that use simple heuristics.