Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Efficient content-based retrieval of motion capture data
ACM SIGGRAPH 2005 Papers
A unified approach to content-based and fault-tolerant music recognition
IEEE Transactions on Multimedia
Distributed Management of Massive Data: An Efficient Fine-Grain Data Access Scheme
High Performance Computing for Computational Science - VECPAR 2008
BlobSeer: how to enable efficient versioning for large object storage under heavy access concurrency
Proceedings of the 2009 EDBT/ICDT Workshops
Hi-index | 0.00 |
This tutorial describes techniques essential for searching the large multimedia databases that are now common on the Internet. There are up to 10 million songs in commercial music catalogues and over 300 million images stored in online photo services such as Flickr. How can we find the music, videos or images we want? How can we organize such large collections: find duplicates, create links between similar documents, extract and annotate semantic structures from complex audiovisual documents? Conventional methods for handling large data sets, such as hashing, get us part of the way, but those methods may not straightforwardly be used for similarity-based matching and retrieval in audiovisual document collections. On the other hand, several elaborate methods from multimedia retrieval are available for semantic document analysis. Unfortunately, those methods generally do not scale for large data sets. Instead, new classes of algorithms combining the best of the two worlds of large data methods and semantic analysis are needed to handle large multimedia databases. Innovative methods such as locality sensitive hashing, which are based on randomized probes, are the new workhorses. This tutorial covers methods for multimedia retrieval on large document collections. Starting with audio retrieval, we describe both the theory (i.e., randomized algorithms for hashing) and the implementation details (how do you store hash values for millions of songs?). A special focus is on how to combine large data methods with semantically meaningful descriptors in order to facilitate efficient similarity-based retrieval. Besides audio, the tutorial also covers image, 3d motion and video retrieval.