Learning from multi-topic web documents for contextual advertisement

Authors:
Yi Zhang;Arun C. Surendran;John C. Platt;Mukund Narasimhan
Affiliations:
Microsoft AdCenter Labs, Redmond, WA, USA;Microsoft adCenter Labs, Redmond, WA, USA;Microsoft Research, Redmond, WA, USA;Microsoft Live Search, Redmond, WA, Uganda
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 14
Cited 6

Integrated segmentation and recognition of hand-printed numerals

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
A framework for multiple-instance learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Content-Based Image Retrieval Using Multiple-Instance Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Multiple-Instance Learning for Natural Scene Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Solving the Multiple-Instance Problem: A Lazy Learning Approach

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Eliminating noisy information in Web pages for data mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Web-page classification through summarization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Perceptrons: An Introduction to Computational Geometry

Perceptrons: An Introduction to Computational Geometry
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Sensitive webpage classification for content advertising

Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising

Blogger-centric contextual advertising

Proceedings of the 18th ACM conference on Information and knowledge management
Blogger-Centric Contextual Advertising

Expert Systems with Applications: An International Journal
A site oriented method for segmenting web pages

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Indices of novelty for emerging topic detection

Information Processing and Management: an International Journal
Mobile Medicine: semantic computing management for health care applications on desktop and mobile devices

Multimedia Tools and Applications
A static and dynamic recommendations system for best practice networks

HCI'13 Proceedings of the 15th international conference on Human-Computer Interaction: users and contexts of use - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

Contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges. Often advertisers wish to either target (or avoid) some specific content on web pages which may appear only in a small part of the page. Learning for these targeting tasks is difficult since most training pages are multi-topic and need expensive human labeling at the sub-document level for accurate training. In this paper we investigate ways to learn for sub-document classification when only page level labels are available - these labels only indicate if the relevant content exists in the given page or not. We propose the application of multiple-instance learning to this task to improve the effectiveness of traditional methods. We apply sub-document classification to two different problems in contextual advertising. One is "sensitive content detection" where the advertiser wants to avoid content relating to war, violence, pornography, etc. even if they occur only in a small part of a page. The second problem involves opinion mining from review sites - the advertiser wants to detect and avoid negative opinion about their product when positive, negative and neutral sentiments co-exist on a page. In both these scenarios we present experimental results to show that our proposed system is able to get good block level labeling for free and improve the performance of traditional learning methods.