RanKloud: scalable multimedia and social media retrieval and analysis in the cloud

Authors:
K. Selçuk Candan
Affiliations:
Arizona State University, Tempe, AZ, USA
Venue:
Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
Year:
2011

Citing 6
Cited 0

RanKloud: a scalable ranked query processing framework on hadoop

Proceedings of the 14th International Conference on Extending Database Technology
RanKloud: Scalable Multimedia Data Processing in Server Clusters

IEEE MultiMedia
SCENT: Scalable compressed monitoring of evolving multirelational social networks

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special section on ACM multimedia 2010 best paper candidates, and issue on social media
Fast metadata-driven multiresolution tensor decomposition

Proceedings of the 20th ACM international conference on Information and knowledge management
Approximate tensor decomposition within a tensor-relational algebraic framework

Proceedings of the 20th ACM international conference on Information and knowledge management
On context-aware co-clustering with metadata support

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, multimedia data are produced in massive quantities, thanks to a diverse spectrum of applications including entertainment, surveillance, e-commerce, web, and social media. In particular, social media data have three challenging characteristics: data sizes are enormous, data are often multi-faceted, and data are dynamic. Tensors (multi-dimensional arrays) are widely used for representing such high-order dimensional data. Consequently, a system dealing with social media data needs to scale with the tensor volume and the number and diversity of the data facets. This necessitates highly parallelizable, and in many cases cloud-based, frameworks for scalable processing and efficient analysis of large media and social media collections. Most multimedia applications share a few core operations, including integration/fusion, classification, clustering, graph analysis, near-neighbor search, and similarity search. When performed naively, however, these core operations are often very costly, because the number of objects and object features that need to be considered can be prohibitive. Avoiding this cost requires that redundant work is avoided. Thus, for the next generation cloud-based massive media processing and analysis systems to have transformative impact, the fundamental principles that govern their design must include an awareness of the utilities of data and features to a particular analysis task. Recently, the observation that - while not all - a significant class of data processing applications can be expressed in terms of a small set of primitives that are, in many cases, easy to parallelize, has led to frameworks, such as MapReduce, which have been successfully applied in data processing, mining, and information retrieval domains. Yet, in many other domains (including many aggregation and join tasks that are hard to parallelize) they significantly lag behind traditional solutions. In particular, many multimedia and social media analysis tasks are in the category of applications that pose significant challenges. In this talk, I will present an overview of recent developments in the area of scalable multimedia and social media retrieval and analysis in the cloud and our own efforts [1, 2, 3, 4, 5, 6] to build a scalable data processing middleware, called RanKloud, specifically sensitive to the needs and requirements of multimedia and social media analysis applications. RanKloud avoids waste by intelligently partitioning the data and allocating it on available resources to minimize the data replication and indexing overheads and to prune superfluous low-utility processing. It also includes a tensor-based relational data model to support the complete lifecycle (from collection to analysis) of the data, involving various integration and other manipulation steps. RanKloud also addresses the computational cost of various multi-dimensional data analysis operations, including decomposition or structural change detection, by (a) leveraging a priori background knowledge (or metadata) about one or more domain dimensions and (b) by extending compressed sensing (CS) to tensor data to encode the observed tensor streams in the form of compact descriptors. RanKloud will extend the scope of cloud-based systems to the delivery of efficient and large scale analysis over data with variable utility and, thus, will enable new and efficient applications, tools, and systems for multimedia and social media retrieval and analysis.