Investigating aboutness axioms using information fields
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A study of aboutness in information retrieval
Artificial Intelligence Review
Aboutness from a commonsense perspective
Journal of the American Society for Information Science
Towards a theory of aboutness, subject, topicality, theme, domain, field, content …and relevance
Journal of the American Society for Information Science
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Using terminological feedback for web search refinement: a log-based study
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Interactive Document Retrieval using Faceted Terminological Feedback
HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Identifying topics by position
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
The automated acquisition of topic signatures for text summarization
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Focused named entity recognition using machine learning
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web-page summarization using clickthrough data
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Ontology ranking based on the analysis of concept structures
Proceedings of the 3rd international conference on Knowledge capture
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
EntityRank: searching entities directly and holistically
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A graph-theoretic approach to webpage segmentation
Proceedings of the 17th international conference on World Wide Web
Keyword extraction for contextual advertisement
Proceedings of the 17th international conference on World Wide Web
Optimizing relevance and revenue in ad search: a query substitution approach
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Combining document- and paragraph-based entity ranking
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Generating succinct titles for web URLs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Contextual Ranking of Keywords Using Click Data
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Ranking ontologies with AKTiveRank
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Enriching textbooks through data mining
Proceedings of the First ACM Symposium on Computing for Development
Extracting events and event descriptions from Twitter
Proceedings of the 20th international conference companion on World wide web
Intent-based diversification of web search results: metrics and algorithms
Information Retrieval
From chatter to headlines: harnessing the real-time web for personalized news recommendation
Proceedings of the fifth ACM international conference on Web search and data mining
Extracting search-focused key n-grams for relevance ranking in web search
Proceedings of the fifth ACM international conference on Web search and data mining
Learning to suggest: a machine learning framework for ranking query suggestions
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Mining web query logs to analyze political issues
Proceedings of the 3rd Annual ACM Web Science Conference
Proceedings of the sixth ACM international conference on Web search and data mining
A survey of recommender systems in twitter
SocInfo'12 Proceedings of the 4th international conference on Social Informatics
Automatic selection of social media responses to news
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Searching for interestingness in Wikipedia and Yahoo!: answers
Proceedings of the 22nd international conference on World Wide Web companion
Identifying salient entities in web pages
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Penguins in sweaters, or serendipitous entity search on user-generated content
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Says who?: automatic text-based content analysis of television news
Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Hi-index | 0.01 |
Capturing the "aboutness" of documents has been a key research focus throughout the history of automated textual information processing. In this work, we represent aboutness using words and phrases that best reflect the central topics of a document. We present a machine learning approach that learns to score and rank words and phrases in a document according to their relevance to the document. We use implicit user feedback available in search engine click logs to characterize the user-perceived notion of term relevance. Using a small set of manually generated training data, we show that the surrogate training data from click logs correlates well with this data, thus eliminating the need to create data for training manually which is both expensive and fundamentally difficult to obtain for such a task. Further, we use a diverse set of features in our learning model that capitalize heavily on the structural and visual properties of web documents. In our extensive experimentation, we pay particular attention to tail web pages and show that our approach trained on mainly head web pages generalizes and performs well on all kinds of documents. In several evaluation methods using manually generated summaries and term relevance judgments, our system shows 25% improvement over other aboutness solutions.