Information Retrieval from Documents: A Survey
Information Retrieval
Feature string-based intelligent information retrieval from Tamil document images
International Journal of Computer Applications in Technology
The impact of OCR accuracy and feature transformation on automatic text classification
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
The vector-space model offers an easy and robust model for Information Retrieval. Thereby, the similarities between queries and documents as well as the similarities between documents themselves are of importance. Document similarities may be used in order to generate links between documents that lead users from one document to related ones. Studies have shown that the vector-space model is robust in the context of OCR-processing if manually constructed queries are used. However, it is not clear whether this model, if used for hypertext construction, is robust with regard to data corruption as caused by OCR engines. In this paper, we describe the performance of automatic hypertext construction, based on the vector-space model, with regard to three different measures: the number of overtakings within the used rankings, the accumulated distance of a document's position within the rankings and a comparison based on recall-precision graphs.