Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Bursty and hierarchical structure in streams
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Identifying similarities, periodicities and bursts for online search queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Impedance coupling in content-targeted advertising
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive Calculation of Scores for Fresh Information Retrieval
ICPADS '05 Proceedings of the 11th International Conference on Parallel and Distributed Systems - Volume 01
Finding advertising keywords on web pages
Proceedings of the 15th international conference on World Wide Web
Temporal context: applications and implications for computational linguistics
ACLstudent '04 Proceedings of the ACL 2004 workshop on Student research
A semantic approach to contextual advertising
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Just-in-time contextual advertising
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Visualizing historical content of web pages
Proceedings of the 17th international conference on World Wide Web
On burstiness-aware search for document sequences
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Leveraging temporal dynamics of document content in relevance ranking
Proceedings of the third ACM international conference on Web search and data mining
Word weighting based on user's browsing history
UM'03 Proceedings of the 9th international conference on User modeling
A characterization of online browsing behavior
Proceedings of the 19th international conference on World wide web
Search your interests everywhere!: wikipedia-based keyphrase extraction from web browsing history
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Linear time series models for term weighting in information retrieval
Journal of the American Society for Information Science and Technology
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Match the news: a firefox extension for real-time news recommendation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Keyword extraction from web pages is essential to various text mining tasks including contextual advertising, recommendation selection, user profiling and personalization. For example, extracted keywords in contextual advertising are used to match advertisements with the web page currently browsed by a user. Most of the keyword extraction methods mainly rely on the content of a single web page, ignoring the browsing history of a user, and hence, potentially leading to the same advertisements or recommendations. In this work we propose a new feature scoring algorithm for web page terms extraction that, assuming a recent browsing history per user, takes into account the freshness of keywords in the current page as means of shifting users interests. We propose BM25H, a variant of BM25 scoring function, implemented on the client-side, that takes into account the user browsing history and suggests keywords relevant to the currently browsed page, but also fresh with respect to the user's recent browsing history. In this way, for each web page we obtain a set of keywords, representing the time shifting interests of the user. BM25H avoids repetitions of keywords which may be simply domain specific stop-words, or may result in matching the same ads or similar recommendations. Our experimental results show that BM25H achieves more than 70% in precision at 20 extracted keywords (based on human blind evaluation) and outperforms our baselines (TF and BM25 scoring functions), while it succeeds in keeping extracted keywords fresh compared to recent user history.