Learning document aboutness from implicit user feedback and document structure

Authors:
Deepa Paranjpe
Affiliations:
Yahoo! Labs, Sunnyvale, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 27
Cited 14

Investigating aboutness axioms using information fields

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A study of aboutness in information retrieval

Artificial Intelligence Review
Aboutness from a commonsense perspective

Journal of the American Society for Information Science
Towards a theory of aboutness, subject, topicality, theme, domain, field, content …and relevance

Journal of the American Society for Information Science
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Using terminological feedback for web search refinement: a log-based study

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Interactive Document Retrieval using Faceted Terminological Feedback

HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Identifying topics by position

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Focused named entity recognition using machine learning

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web-page summarization using clickthrough data

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Ontology ranking based on the analysis of concept structures

Proceedings of the 3rd international conference on Knowledge capture
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
EntityRank: searching entities directly and holistically

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An experimental comparison of click position-bias models

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A graph-theoretic approach to webpage segmentation

Proceedings of the 17th international conference on World Wide Web
Keyword extraction for contextual advertisement

Proceedings of the 17th international conference on World Wide Web
Optimizing relevance and revenue in ad search: a query substitution approach

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Combining document- and paragraph-based entity ranking

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generating succinct titles for web URLs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Contextual Ranking of Keywords Using Click Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Extracting content structure for web pages based on visual representation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Ranking ontologies with AKTiveRank

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Enriching textbooks through data mining

Proceedings of the First ACM Symposium on Computing for Development
Extracting events and event descriptions from Twitter

Proceedings of the 20th international conference companion on World wide web
Intent-based diversification of web search results: metrics and algorithms

Information Retrieval
From chatter to headlines: harnessing the real-time web for personalized news recommendation

Proceedings of the fifth ACM international conference on Web search and data mining
Extracting search-focused key n-grams for relevance ranking in web search

Proceedings of the fifth ACM international conference on Web search and data mining
Learning to suggest: a machine learning framework for ranking query suggestions

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Mining web query logs to analyze political issues

Proceedings of the 3rd Annual ACM Web Science Conference
From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph

Proceedings of the sixth ACM international conference on Web search and data mining
A survey of recommender systems in twitter

SocInfo'12 Proceedings of the 4th international conference on Social Informatics
Automatic selection of social media responses to news

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Searching for interestingness in Wikipedia and Yahoo!: answers

Proceedings of the 22nd international conference on World Wide Web companion
Identifying salient entities in web pages

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Penguins in sweaters, or serendipitous entity search on user-generated content

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Says who?: automatic text-based content analysis of television news

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Capturing the "aboutness" of documents has been a key research focus throughout the history of automated textual information processing. In this work, we represent aboutness using words and phrases that best reflect the central topics of a document. We present a machine learning approach that learns to score and rank words and phrases in a document according to their relevance to the document. We use implicit user feedback available in search engine click logs to characterize the user-perceived notion of term relevance. Using a small set of manually generated training data, we show that the surrogate training data from click logs correlates well with this data, thus eliminating the need to create data for training manually which is both expensive and fundamentally difficult to obtain for such a task. Further, we use a diverse set of features in our learning model that capitalize heavily on the structural and visual properties of web documents. In our extensive experimentation, we pay particular attention to tail web pages and show that our approach trained on mainly head web pages generalizes and performs well on all kinds of documents. In several evaluation methods using manually generated summaries and term relevance judgments, our system shows 25% improvement over other aboutness solutions.