Relevance-index size tradeoff in contextual advertising

Authors:
Pavan Kumar GM;Krishna P. Leela;Mehul Parsana;Sachin Garg
Affiliations:
Yahoo! Labs, Bangalore, India;Yahoo! Labs, Bangalore, India;Yahoo! Labs, Bangalore, India;Yahoo! Labs, Bangalore, India
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 12
Cited 0

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Induction of Decision Trees

Machine Learning
Impedance coupling in content-targeted advertising

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Do not crawl in the dust: different urls with similar text

Proceedings of the 16th international conference on World Wide Web
A semantic approach to contextual advertising

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Just-in-time contextual advertising

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A noisy-channel approach to contextual advertising

Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Mining search engine query logs via suggestion sampling

Proceedings of the VLDB Endowment
Estimating the impressionrank of web pages

Proceedings of the 18th international conference on World wide web
Nearest-neighbor caching for content-match applications

Proceedings of the 18th international conference on World wide web
Learning URL patterns for webpage de-duplication

Proceedings of the third ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Contextual advertising, textual ads relevant to the content in a webpage are embedded in the page. Content keywords are extracted offline by crawling webpages and then stored in an index for fast serving. Given a page, ad selection involves index lookup, computing similarity between the keywords of the page and those of candidate ads and returning the top-k scoring ads. In this approach, there is a tradeoff between relevance and index size where better relevance can be achieved if there are no limits on the index size. However, the assumption of unlimited index size is not practical due to the large number of pages on the Web and stringent requirements on the serving latency. Secondly, page visits on the web follows power-law distribution where a significant proportion of the pages are visited infrequently, also called the tail pages. Indexing tail pages is not efficient given that these pages are accessed very infrequently. We propose a novel mechanism to mitigate these problems in the same framework. The basic idea is to index the same keyword vector for a set of similar pages. The scheme involves learning a website specific hierarchy from (page, URL) pairs of the website. Next, keywords are populated on the nodes via bottom-up traversal over the hierarchy. We evaluate our approach on a human labeled dataset where our approach has higher nDCG compared to a recent approach even though the index size of our approach is 7 times less than index size of the recent approach.