Learning and evaluating classifiers under sample selection bias
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Using web-based search data to predict macroeconomic statistics
Communications of the ACM
An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
The influence of task and gender on search and evaluation behavior using Google
Information Processing and Management: an International Journal
Demographic prediction based on user's browsing behavior
Proceedings of the 16th international conference on World Wide Web
"I know what you did last summer": query logs and user privacy
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A Comparative Study of Methods for Transductive Transfer Learning
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Transferring naive bayes classifiers for text classification
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Personalized social search based on the user's social network
Proceedings of the 18th ACM conference on Information and knowledge management
Gender demographic targeting in sponsored search
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Classification-enhanced ranking
Proceedings of the 19th international conference on World wide web
Detecting epidemic tendency by mining search logs
Proceedings of the 19th international conference on World wide web
The demographics of web search
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Inferring gender of movie reviewers: exploiting writing style, content and metadata
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Who uses web search for what: and how
Proceedings of the fourth ACM international conference on Web search and data mining
Towards detecting influenza epidemics by analyzing Twitter messages
Proceedings of the First Workshop on Social Media Analytics
Inferring and using location metadata to personalize web search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Democrats, republicans and starbucks afficionados: user classification in twitter
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
What and how children search on the web
Proceedings of the 20th ACM international conference on Information and knowledge management
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Personality and patterns of Facebook usage
Proceedings of the 3rd Annual ACM Web Science Conference
Mining web query logs to analyze political issues
Proceedings of the 3rd Annual ACM Web Science Conference
Hi-index | 0.00 |
Knowing users' views and demographic traits offers a great potential for personalizing web search results or related services such as query suggestion and query completion. Such signals however are often only available for a small fraction of search users, namely those who log in with their social network account and allow its use for personalization of search results. In this paper, we offer a solution to this problem by showing how user demographic traits such as age and gender, and even political and religious views can be efficiently and accurately inferred based on their search query histories. This is accomplished in two steps; we first train predictive models based on the publically available myPersonality dataset containing users' Facebook Likes and their demographic information. We then match Facebook Likes with search queries using Open Directory Project categories. Finally, we apply the model trained on Facebook Likes to large-scale query logs of a commercial search engine while explicitly taking into account the difference between the traits distribution in both datasets. We find that the accuracy of classifying age and gender, expressed by the area under the ROC curve (AUC), are 77% and 84% respectively for predictions based on Facebook Likes, and only degrade to 74% and 80% when based on search queries. On a US state-by-state basis we find a Pearson correlation of 0.72 for political views between the predicted scores and Gallup data, and 0.54 for affiliation with Judaism between predicted scores and data from the US Religious Landscape Survey. We conclude that it is indeed feasible to infer important demographic data of users from their query history based on labelled Likes data and believe that this approach could provide valuable information for personalization and monetization even in the absence of demographic data.