Feature Extraction for Document Image Segmentation by pLSA Model

Authors:
Takuma Yamaguchi;Minoru Maruyama
Affiliations:
-;-
Venue:
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Year:
2008

Citing 0
Cited 2

Multi modal semantic indexing for image retrieval

Proceedings of the ACM International Conference on Image and Video Retrieval
Understanding Digital Documents Using Gestalt Properties of Isothetic Components

International Journal of Digital Library Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a method for document image segmentation based on pLSA (probabilistic latent semantic analysis) model. The pLSA model is originally developed for topic discovery in text analysis using "bag-of-words" document representation. The model is useful for image analysis by "bag-of-visual words" image representation. The performance of the method depends on the visual vocabulary generated by feature extraction from the document image. We compare several feature extraction and description methods, and examine the relations to segmentation performance. Through the experiments, we show accurate content-based document segmentation is made possible by using pLSA-based method.