Partitioning and ranking tagged data sources

Authors:
Milad Eftekhar;Nick Koudas
Affiliations:
Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 13
Cited 0

Max-Min Tree Partitioning

Journal of the ACM (JACM)
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Theoretic Clustering of Sparse Co-Occurrence Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
P-TAG: large scale automatic generation of personalized annotation tags for the web

Proceedings of the 16th international conference on World Wide Web
Towards effective browsing of large scale social annotations

Proceedings of the 16th international conference on World Wide Web
Can social bookmarking improve web search?

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
TwitterStand: news in tweets

Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Improved search for socially annotated data

Proceedings of the VLDB Endowment
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Identifying, attributing and describing spatial bursts

Proceedings of the VLDB Endowment
Structural trend analysis for online social networks

Proceedings of the VLDB Endowment
Trend detection in folksonomies

SAMT'06 Proceedings of the First international conference on Semantic and Digital Media Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online types of expression in the form of social networks, micro-blogging, blogs and rich content sharing platforms have proliferated in the last few years. Such proliferation contributed to the vast explosion in online data sharing we are experiencing today. One unique aspect of online data sharing is tags manually inserted by content generators to facilitate content description and discovery (e.g., hashtags in tweets). In this paper we focus on these tags and we study and propose algorithms that make use of tags in order to automatically organize and categorize this vast collection of socially contributed and tagged information. In particular, we take a holistic approach in organizing such tags and we propose algorithms to partition as well as rank this information collection. Our partitioning algorithms aim to segment the entire collection of tags (and the associated content) into a specified number of partitions for specific problem constraints. In contrast our ranking algorithms aim to identify few partitions fast, for suitably defined ranking functions. We present a detailed experimental study utilizing the full twitter firehose (set of all tweets in the Twitter service) that attests to the practical utility and effectiveness of our overall approach. We also present a detailed qualitative study of our results.