Real life, real users, and real needs: a study and analysis of user queries on the web
Information Processing and Management: an International Journal
Word reordering and a dynamic programming beam search algorithm for statistical machine translation
Computational Linguistics
Generating natural language summaries from multiple on-line sources
Computational Linguistics - Special issue on natural language generation
Learning features that predict cue usage
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Using machine learning techniques to build a comma checker for Basque
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
The linguistic structure of English web-search queries
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Automated snippet generation for online advertising
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
In sponsored search, advertisers bid on phrases representative of offered products or services. For large advertisers, these phrases often come from quasi-algorithmically generated lists of thousands of terms prone to poor linguistic construction. A bidded term by itself is usually unsuitable for direct insertion into an ad copy template; it must be rephrased and capitalized properly to fit the template, possibly with additional language to avoid semantic ambiguity. We develop a natural language generation system to automate these steps, preparing a list of terms for insertion into an ad template. For each input term, our system first finds a proper word ordering by mining a corpus of Web search query logs. Next it determines whether the term is ambiguous and--if semantics dictate--attaches a clarifying modifier culled from query logs. Finally, it applies proper capitalization by analyzing pages from Web search engine results. Each step yields a plausible set of displayable forms from which a machine-learned model selects the best. The models are trained and tested on a large set of human-labeled data. The overall system significantly outperforms baseline systems that use simple heuristics.