Efficient word retrieval by means of SOM clustering and PCA

  • Authors:
  • Simone Marinai;Stefano Faini;Emanuele Marino;Giovanni Soda

  • Affiliations:
  • Dipartimento di Sistemi e Informatica, Università di Firenze, Firenze, Italy;Dipartimento di Sistemi e Informatica, Università di Firenze, Firenze, Italy;Dipartimento di Sistemi e Informatica, Università di Firenze, Firenze, Italy;Dipartimento di Sistemi e Informatica, Università di Firenze, Firenze, Italy

  • Venue:
  • DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an approach for efficient word retrieval from printed documents belonging to Digital Libraries. The approach combines word image clustering (based on Self Organizing Maps, SOM) with Principal Component Analysis. The combination of these methods allows us to efficiently retrieve the matching words from large documents collections without the need for a direct comparison of the query word with each indexed word.