Keeping keywords fresh: a BM25 variation for personalized keyword extraction

Authors:
Margarita Karkali;Vassilis Plachouras;Constantinos Stefanatos;Michalis Vazirgiannis
Affiliations:
Athens University of Economics and Business;Presans;Athens University of Economics and Business;Athens University of Economics and Business
Venue:
Proceedings of the 2nd Temporal Web Analytics Workshop
Year:
2012

Citing 20
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Identifying similarities, periodicities and bursts for online search queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Impedance coupling in content-targeted advertising

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Calculation of Scores for Fresh Information Retrieval

ICPADS '05 Proceedings of the 11th International Conference on Parallel and Distributed Systems - Volume 01
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
Temporal context: applications and implications for computational linguistics

ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
A semantic approach to contextual advertising

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Just-in-time contextual advertising

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Visualizing historical content of web pages

Proceedings of the 17th international conference on World Wide Web
On burstiness-aware search for document sequences

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Leveraging temporal dynamics of document content in relevance ranking

Proceedings of the third ACM international conference on Web search and data mining
Word weighting based on user's browsing history

UM'03 Proceedings of the 9th international conference on User modeling
A characterization of online browsing behavior

Proceedings of the 19th international conference on World wide web
Search your interests everywhere!: wikipedia-based keyphrase extraction from web browsing history

Proceedings of the 21st ACM conference on Hypertext and hypermedia
Linear time series models for term weighting in information retrieval

Journal of the American Society for Information Science and Technology
Using the past to score the present: extending term weighting models through revision history analysis

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Match the news: a firefox extension for real-time news recommendation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword extraction from web pages is essential to various text mining tasks including contextual advertising, recommendation selection, user profiling and personalization. For example, extracted keywords in contextual advertising are used to match advertisements with the web page currently browsed by a user. Most of the keyword extraction methods mainly rely on the content of a single web page, ignoring the browsing history of a user, and hence, potentially leading to the same advertisements or recommendations. In this work we propose a new feature scoring algorithm for web page terms extraction that, assuming a recent browsing history per user, takes into account the freshness of keywords in the current page as means of shifting users interests. We propose BM25H, a variant of BM25 scoring function, implemented on the client-side, that takes into account the user browsing history and suggests keywords relevant to the currently browsed page, but also fresh with respect to the user's recent browsing history. In this way, for each web page we obtain a set of keywords, representing the time shifting interests of the user. BM25H avoids repetitions of keywords which may be simply domain specific stop-words, or may result in matching the same ads or similar recommendations. Our experimental results show that BM25H achieves more than 70% in precision at 20 extracted keywords (based on human blind evaluation) and outperforms our baselines (TF and BM25 scoring functions), while it succeeds in keeping extracted keywords fresh compared to recent user history.