Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Machine Learning
Impedance coupling in content-targeted advertising
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
Do not crawl in the dust: different urls with similar text
Proceedings of the 16th international conference on World Wide Web
A semantic approach to contextual advertising
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Just-in-time contextual advertising
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A noisy-channel approach to contextual advertising
Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Mining search engine query logs via suggestion sampling
Proceedings of the VLDB Endowment
Estimating the impressionrank of web pages
Proceedings of the 18th international conference on World wide web
Nearest-neighbor caching for content-match applications
Proceedings of the 18th international conference on World wide web
Learning URL patterns for webpage de-duplication
Proceedings of the third ACM international conference on Web search and data mining
Hi-index | 0.00 |
In Contextual advertising, textual ads relevant to the content in a webpage are embedded in the page. Content keywords are extracted offline by crawling webpages and then stored in an index for fast serving. Given a page, ad selection involves index lookup, computing similarity between the keywords of the page and those of candidate ads and returning the top-k scoring ads. In this approach, there is a tradeoff between relevance and index size where better relevance can be achieved if there are no limits on the index size. However, the assumption of unlimited index size is not practical due to the large number of pages on the Web and stringent requirements on the serving latency. Secondly, page visits on the web follows power-law distribution where a significant proportion of the pages are visited infrequently, also called the tail pages. Indexing tail pages is not efficient given that these pages are accessed very infrequently. We propose a novel mechanism to mitigate these problems in the same framework. The basic idea is to index the same keyword vector for a set of similar pages. The scheme involves learning a website specific hierarchy from (page, URL) pairs of the website. Next, keywords are populated on the nodes via bottom-up traversal over the hierarchy. We evaluate our approach on a human labeled dataset where our approach has higher nDCG compared to a recent approach even though the index size of our approach is 7 times less than index size of the recent approach.