A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 11th international conference on World Wide Web
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Personalized web search by mapping user queries to categories
Proceedings of the eleventh international conference on Information and knowledge management
Latent Class Models for Collaborative Filtering
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Scaling personalized web search
WWW '03 Proceedings of the 12th international conference on World Wide Web
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Collaborative filtering via gaussian probabilistic latent semantic analysis
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Unifying collaborative and content-based filtering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Ontology-based personalized search and browsing
Web Intelligence and Agent Systems
A Bayesian approach toward active learning for collaborative filtering
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Document classification through interactive supervision of document and term labels
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Relevance weighting for query independent evidence
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Test theory for assessing IR test collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Personalized active learning for collaborative filtering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Exploring folksonomy for personalized search
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient personalized pagerank with accuracy assurance
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A fuzzy-summary-based approach to faceted search in relational databases
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Hi-index | 0.00 |
Personalized search systems have evolved to utilize heterogeneous features including document hyperlinks, category labels in various taxonomies and social tags in addition to free-text of the documents. Consequently, classifiers, PageRank algorithms and Collaborative Filtering methods are often used as intermediate steps in such personalized retrieval systems. Thorough comparative evaluation of such complex systems has been difficult due to the lack of appropriate publicly available datasets that provide such diverse feature sets. To remedy the situation, we have created CiteData, a new dataset for benchmark evaluations of personalized search performance, that will be made publicly accessible. CiteData is a collection of academic articles extracted from CiteULike and CiteSeer repositories, with rich feature sets such as authors, author-affiliations, topic labels, social tags and citation information. We further supplement it with personalized queries and relevance judgments which were obtained from volunteer users. This paper starts with a discussion of the design criteria and characteristics of the CiteData dataset in comparison with current benchmark datasets, followed by a set of task-oriented empirical evaluations of popular algorithms in statistical classification, collaborative filtering and link analysis as intermediate steps for personalized search. Our results show significant performance improvement of personalized approaches, over that of unpersonalized approaches. We also observe that a meta personalized search engine that leverages information from multiple sources of features performs better than algorithms that use only one of the constituent source of features.