Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Large test collection experiments on an operational, interactive system: Okapi at TREC
TREC-2 Proceedings of the second conference on Text retrieval conference
The nature of statistical learning theory
The nature of statistical learning theory
Exploring the similarity space
ACM SIGIR Forum
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Evaluating evaluation measure stability
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Web-collaborative filtering: recommending music by crawling the Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
The Journal of Machine Learning Research
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
ACM SIGKDD Explorations Newsletter
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Towards musical query-by-semantic-description using the CAL500 data set
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A music search engine built upon audio-based and web-based similarity measures
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Exploring Music Collections in Virtual Landscapes
IEEE MultiMedia
Collaborative filtering with temporal dynamics
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
A statistical approach to mechanized encoding and searching of literary information
IBM Journal of Research and Development
TwitterRank: finding topic-sensitive influential twitterers
Proceedings of the third ACM international conference on Web search and data mining
Time is of the essence: improving recency ranking using Twitter data
Proceedings of the 19th international conference on World wide web
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Earthquake shakes Twitter users: real-time event detection by social sensors
Proceedings of the 19th international conference on World wide web
Modern Information Retrieval
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Summarizing microblogs automatically
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
You are where you tweet: a content-based approach to geo-locating twitter users
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An empirical study on learning to rank of tweets
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
#TwitterSearch: a comparison of microblog search and web search
Proceedings of the fourth ACM international conference on Web search and data mining
A music information system automatically generated via Web content mining techniques
Information Processing and Management: an International Journal
Exploring the music similarity space on the web
ACM Transactions on Information Systems (TOIS)
Hierarchical organization and description of music collections at the artist level
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
“Reinventing the Wheel”: A Novel Approach to Music Player Interfaces
IEEE Transactions on Multimedia
Divergence measures based on the Shannon entropy
IEEE Transactions on Information Theory
The CoMIRVA toolkit for visualizing music-related data
EUROVIS'07 Proceedings of the 9th Joint Eurographics / IEEE VGTC conference on Visualization
A professionally annotated and enriched multimodal data set on popular music
Proceedings of the 4th ACM Multimedia Systems Conference
Hybrid retrieval approaches to geospatial music recommendation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Towards scalable and accurate item-oriented recommendations
Proceedings of the 7th ACM conference on Recommender systems
Proceedings of International Conference on Advances in Mobile Computing & Multimedia
Hi-index | 0.00 |
Different term weighting techniques such as $$TF\cdot IDF$$ or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures, normalization techniques, query schemes, index term sets, and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts. We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com . For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb . We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source.