Multi modal semantic indexing for image retrieval
Proceedings of the ACM International Conference on Image and Video Retrieval
Understanding Digital Documents Using Gestalt Properties of Isothetic Components
International Journal of Digital Library Systems
Hi-index | 0.00 |
In this paper, we propose a method for document image segmentation based on pLSA (probabilistic latent semantic analysis) model. The pLSA model is originally developed for topic discovery in text analysis using "bag-of-words" document representation. The model is useful for image analysis by "bag-of-visual words" image representation. The performance of the method depends on the visual vocabulary generated by feature extraction from the document image. We compare several feature extraction and description methods, and examine the relations to segmentation performance. Through the experiments, we show accurate content-based document segmentation is made possible by using pLSA-based method.