Towards optimal bag-of-features for object categorization and semantic video retrieval

Authors:
Yu-Gang Jiang;Chong-Wah Ngo;Jun Yang
Affiliations:
City University of Hong Kong, Kowloon, Hong Kong;City University of Hong Kong, Kowloon, Hong Kong;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 6th ACM international conference on Image and video retrieval
Year:
2007

Citing 13
Cited 101

The nature of statistical learning theory

The nature of statistical learning theory
Feature Detection with Automatic Scale Selection

International Journal of Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Spectral Grouping Using the Nyström Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A Comparison of Affine Region Detectors

International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Hyperfeatures – multilevel local coding for visual recognition

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Sampling strategies for bag-of-features image classification

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
The 2005 PASCAL visual object classes challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Building kernels from binary strings for image matching

IEEE Transactions on Image Processing
Support vector machines for histogram-based image classification

IEEE Transactions on Neural Networks

Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
A comparison of color features for visual concept classification

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
A probabilistic ranking framework using unobservable binary events for video search

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Bag-of-visual-words expansion using visual relatedness for video indexing

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Correlative multilabel video annotation with temporal kernels

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multi-cue fusion for semantic video indexing

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Fusing semantics, observability, reliability and diversity of concept detectors for video search

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Video event detection using motion relativity and visual relatedness

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Accelerating near-duplicate video matching by combining visual similarity and alignment distortion

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Validating the Detection of Everyday Concepts in Visual Lifelogs

SAMT '08 Proceedings of the 3rd International Conference on Semantic and Digital Media Technologies: Semantic Multimedia
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Computer Vision and Image Understanding
Ontology-Based Inter-concept Relation Fusion for Concept Detection

PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Spatial Weighting for Bag-of-Visual-Words and Its Application in Content-Based Image Retrieval

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Histogram of oriented rectangles: A new pose descriptor for human action recognition

Image and Vision Computing
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Semantics-preserving bag-of-words models for efficient image annotation

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
MovieBase: a movie database for event detection and behavioral analysis

WSMC '09 Proceedings of the 1st workshop on Web-scale multimedia corpus
Semantic context transfer across heterogeneous sources for domain adaptive video search

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Brain state decoding for rapid image retrieval

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Kernel Methods in Computer Vision

Foundations and Trends® in Computer Graphics and Vision
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
Probabilistic models for topic learning from images and captions in online biomedical literatures

Proceedings of the 18th ACM conference on Information and knowledge management
Real-time bag of words, approximately

Proceedings of the ACM International Conference on Image and Video Retrieval
Exploring inter-concept relationship with context space for semantic video indexing

Proceedings of the ACM International Conference on Image and Video Retrieval
Reusing annotation labor for concept selection

Proceedings of the ACM International Conference on Image and Video Retrieval
Visual words based spatiotemporal sequence matching in video copy detection

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Applying pLSA to region-based image categorization with soft vector quantization

Proceedings of the First International Conference on Internet Multimedia Computing and Service
Distances and weighting schemes for bag of visual words image retrieval

Proceedings of the international conference on Multimedia information retrieval
Digimatge, a rich internet application for video retrieval from a multimedia asset management system

Proceedings of the international conference on Multimedia information retrieval
Comparing compact codebooks for visual categorization

Computer Vision and Image Understanding
Human action recognition using distribution of oriented rectangular patches

Proceedings of the 2nd conference on Human motion: understanding, modeling, capture and animation
Building topographic subspace model with transfer learning for sparse representation

Neurocomputing
Bag of visual words revisited: an exploratory study on robust image retrieval exploiting fuzzy codebooks

Proceedings of the Tenth International Workshop on Multimedia Data Mining
Example-based event retrieval in video archive using rough set theory and video ontology

Proceedings of the Tenth International Workshop on Multimedia Data Mining
Web-scale computer vision using MapReduce for multimedia data mining

Proceedings of the Tenth International Workshop on Multimedia Data Mining
On the sampling of web images for learning visual concept classifiers

Proceedings of the ACM International Conference on Image and Video Retrieval
Genre-specific semantic video indexing

Proceedings of the ACM International Conference on Image and Video Retrieval
Hierarchical feedback algorithm based on visual community discovery for interactive video retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
A visual word weighting scheme based on emerging itemsets for video annotation

Information Processing Letters
Everyday concept detection in visual lifelogs: validation, relationships and trends

Multimedia Tools and Applications
Semantics-preserving bag-of-words models and applications

IEEE Transactions on Image Processing
A probabilistic topic-connection model for automatic image annotation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Multi-information fusion for uncertain semantic representations of videos

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Discriminative codeword selection for image representation

Proceedings of the international conference on Multimedia
Refining video annotation by exploiting inter-shot context

Proceedings of the international conference on Multimedia
Graphical drop caps indexing

GREC'09 Proceedings of the 8th international conference on Graphics recognition: achievements, challenges, and evolution
Weighting informativeness of bag-of-visual-words by kernel optimization for video concept detection

Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
Visual codebooks survey for video on-line processing

ICCVG'10 Proceedings of the 2010 international conference on Computer vision and graphics: Part I
Learning contextual metrics for automatic image annotation

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Video summarization with visual and semantic features

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
An approach to the compact and efficient visual codebook based on SIFT descriptor

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Innovative directions in self-organized distributed multimedia systems

Multimedia Tools and Applications
Human activity recognition: a scheme using multiple cues

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
A visualized communication system using cross-media semantic association

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Top-down cues for event recognition

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Lost in binarization: query-adaptive ranking for similar image search with compact codes

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Consumer video understanding: a benchmark database and an evaluation of human and machine performance

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Spatial codebooks for image categorization

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Multi-class object detection with hough forests using local histograms of visual words

CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part I
Towards hierarchical context: unfolding visual community potential for interactive video retrieval

Multimedia Tools and Applications
Action retrieval with relevance feedback on YouTube videos

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Video semantic concept detection using ontology

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
On the spatial extents of SIFT descriptors for visual concept detection

ICVS'11 Proceedings of the 8th international conference on Computer vision systems
Mining concept relationship in temporal context for effective video annotation

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Audio-visual grouplet: temporal audio-visual interactions for general video concept classification

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Coached active learning for interactive video search

MM '11 Proceedings of the 19th ACM international conference on Multimedia
News story clustering from both what and how aspects: using bag of word model and affinity propagation

AIEMPro '11 Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services
Pursuing the holy grail by interrelating user intentions and bag of visual words to perform retrieval adaptation

SBNMA '11 Proceedings of the 2011 ACM workshop on Social and behavioural networked media access
Drosophila Gene Expression Pattern Annotation through Multi-Instance Multi-Label Learning

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The Visual Extent of an Object

International Journal of Computer Vision
Boosting web video categorization with contextual information from social web

World Wide Web
Johnny: An Autonomous Service Robot for Domestic Environments

Journal of Intelligent and Robotic Systems
Collaborative video reindexing via matrix factorization

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multimodal video concept detection via bag of auditory words and multiple kernel learning

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Visual vocabulary optimization with spatial context for image annotation and classification

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
How to select and customize object recognition approaches for an application?

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Learning compact visual descriptor for low bit rate mobile landmark search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
SUPER: towards real-time event recognition in internet videos

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Robust image retrieval using bag of visual words with fuzzy codebooks and fuzzy assignment

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Codebook quantization for image classification using incremental neural learning and subgraph extraction

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Bag of features using sparse coding for gender classification

Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
Exploring Geotagged images for land-use classification

Proceedings of the ACM multimedia 2012 workshop on Geotagging and its applications in multimedia
Predicting domain adaptivity: redo or recycle?

Proceedings of the 20th ACM international conference on Multimedia
Bag of spatio-visual words for context inference in scene classification

Pattern Recognition
Semantic concept detection for video based on extreme learning machine

Neurocomputing
Multimedia event detection using segment-based approach for motion feature

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
A polynomial model of surgical gestures for real-time retrieval of surgery videos

MCBR-CDS'12 Proceedings of the Third MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support
An efficient indexing method for content-based image retrieval

Neurocomputing
An integrated semantic-based approach in concept based video retrieval

Multimedia Tools and Applications
Finding happiest moments in a social context

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
A novel unsupervised approach for multilevel image clustering from unordered image collection

Frontiers of Computer Science: Selected Publications from Chinese Universities
Modeling multi-object interactions using "string of feature graphs"

Computer Vision and Image Understanding
Cross-modal social image clustering and tag cleansing

Journal of Visual Communication and Image Representation
A Web-Based Multimedia Retrieval System with MCA-Based Filtering and Subspace-Based Learning Algorithms

International Journal of Multimedia Data Engineering & Management
Spatiotemporal bag-of-features for early wildfire smoke detection

Image and Vision Computing
A new ROI based image retrieval system using an auxiliary Gaussian weighting scheme

Multimedia Tools and Applications
Effective automatic image annotation via integrated discriminative and generative models

Information Sciences: an International Journal
Histogram of visual words based on locally adaptive regression kernels descriptors for image feature extraction

Neurocomputing
Multimedia Event Detection Using Segment-Based Approach for Motion Feature

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bag-of-features (BoF) deriving from local keypoints has recently appeared promising for object and scene classification. Whether BoF can naturally survive the challenges such as reliability and scalability of visual classification, nevertheless, remains uncertain due to various implementation choices. In this paper, we evaluate various factors which govern the performance of BoF. The factors include the choices of detector, kernel, vocabulary size and weighting scheme. We offer some practical insights in how to optimize the performance by choosing good keypoint detector and kernel. For the weighting scheme, we propose a novel soft-weighting method to assess the significance of a visual word to an image. We experimentally show that the proposed soft-weighting scheme can consistently offer better performance than other popular weighting methods. On both PASCAL-2005 and TRECVID-2006 datasets, our BoF setting generates competitive performance compared to the state-of-the-art techniques. We also show that the BoF is highly complementary to global features. By incorporating the BoF with color and texture features, an improvement of 50% is reported on TRECVID-2006 dataset.