#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs

Authors:
Markus Schedl
Affiliations:
Department of Computational Perception, Johannes Kepler University, Linz, Austria 4040
Venue:
Information Retrieval
Year:
2012

Citing 42
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Large test collection experiments on an operational, interactive system: Okapi at TREC

TREC-2 Proceedings of the second conference on Text retrieval conference
The nature of statistical learning theory

The nature of statistical learning theory
Exploring the similarity space

ACM SIGIR Forum
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Evaluating evaluation measure stability

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Web-collaborative filtering: recommending music by crawling the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Random Forests

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Learning by googling

ACM SIGKDD Explorations Newsletter
A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Towards musical query-by-semantic-description using the CAL500 data set

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A music search engine built upon audio-based and web-based similarity measures

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Why we twitter: understanding microblogging usage and communities

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Exploring Music Collections in Virtual Landscapes

IEEE MultiMedia
Collaborative filtering with temporal dynamics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
TwitterStand: news in tweets

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development
TwitterRank: finding topic-sensitive influential twitterers

Proceedings of the third ACM international conference on Web search and data mining
Time is of the essence: improving recency ranking using Twitter data

Proceedings of the 19th international conference on World wide web
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Earthquake shakes Twitter users: real-time event detection by social sensors

Proceedings of the 19th international conference on World wide web
Modern Information Retrieval

Modern Information Retrieval
Short text classification in twitter to improve information filtering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Summarizing microblogs automatically

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
You are where you tweet: a content-based approach to geo-locating twitter users

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An empirical study on learning to rank of tweets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
#TwitterSearch: a comparison of microblog search and web search

Proceedings of the fourth ACM international conference on Web search and data mining
A music information system automatically generated via Web content mining techniques

Information Processing and Management: an International Journal
Exploring the music similarity space on the web

ACM Transactions on Information Systems (TOIS)
Hierarchical organization and description of music collections at the artist level

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
“Reinventing the Wheel”: A Novel Approach to Music Player Interfaces

IEEE Transactions on Multimedia
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory
The CoMIRVA toolkit for visualizing music-related data

EUROVIS'07 Proceedings of the 9th Joint Eurographics / IEEE VGTC conference on Visualization

A professionally annotated and enriched multimodal data set on popular music

Proceedings of the 4th ACM Multimedia Systems Conference
Hybrid retrieval approaches to geospatial music recommendation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Towards scalable and accurate item-oriented recommendations

Proceedings of the 7th ACM conference on Recommender systems
Ameliorating Music Recommendation: Integrating Music Content, Music Context, and User Context for Improved Music Retrieval and Recommendation

Proceedings of International Conference on Advances in Mobile Computing & Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Different term weighting techniques such as $$TF\cdot IDF$$ or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures, normalization techniques, query schemes, index term sets, and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts. We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com . For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb . We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source.