Appearance-based video clustering in 2D locality preserving projection subspace

  • Authors:
  • Li-Qun Xu;Bin Luo

  • Affiliations:
  • BT Group Research & Venturing, Ipswich, UK;Anhui University, Hefei, Anhui, China

  • Venue:
  • Proceedings of the 6th ACM international conference on Image and video retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we introduce an effective and unified approach to creating quality video abstractions. The research was motivated by a recently developed subspace learning method called 2D-LPP, or two-dimensional Locality Preserving Projection, which proved to be effective for dimensionality reduction and discriminating enough in 'appearance-based' image recognitions. By exploiting temporal constraints (sequential correlations / contextual content) inherent in a video (vs. random collection of static images) and the use of two 2D-LPP in tandem, an image in the original extremely high (m×n)-dimensional pixel-based image space Im×n is transformed into a point in the compact (d×d)-dimensional feature subspace fd×d, with (d ≪ m) and (d ). This feature subspace has the desired property that visually similar images in Im×n stay close in fd×d and the intrinsic geometry and local structure of the original data are preserved. The feature subspace then lends itself easily to a conventional data clustering technique to identify suitably scattered but temporally connected clusters. If necessary, a global visual colour descriptor can also be used, so the distance metric in clustering incorporates both global and local characteristics. From the clusters, which satisfy some cluster-validity constraints and user requirements (e.g., the number of clusters, most stable or most dynamic content, etc), a summary storyboard of the video is created, comprising pertinent video frames whose features are closest to the centroid of each cluster, for content browsing and search purposes. Experiments on various videos show that the summarisation results are very encouraging when compared with manually acquired 'ground truth'.