GPCA: an efficient dimension reduction scheme for image compression and retrieval

  • Authors:
  • Jieping Ye;Ravi Janardan;Qi Li

  • Affiliations:
  • University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN;University of Delaware, Newark, DE

  • Venue:
  • Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent years have witnessed a dramatic increase in the quantity of image data collected, due to advances in fields such as medical imaging, reconnaissance, surveillance, astronomy, multimedia etc. With this increase has come the need to be able to store, transmit, and query large volumes of image data efficiently. A common operation on image databases is the retrieval of all images that are similar to a query image. For this, the images in the database are often represented as vectors in a high-dimensional space and a query is answered by retrieving all image vectors that are proximal to the query image in this space, under a suitable similarity metric. To overcome problems associated with high dimensionality, such as high storage and retrieval times, a dimension reduction step is usually applied to the vectors to concentrate relevant information in a small number of dimensions. Principal Component Analysis (PCA) is a well-known dimension reduction scheme. However, since it works with vectorized representations of images, PCA does not take into account the spatial locality of pixels in images. In this paper, a new dimension reduction scheme, called Generalized Principal Component Analysis (GPCA), is presented. This scheme works directly with images in their native state, as two-dimensional matrices, by projecting the images to a vector space that is the tensor product of two lower-dimensional vector spaces. Experiments on databases of face images show that, for the same amount of storage, GPCA is superior to PCA in terms of quality of the compressed images, query precision, and computational cost.