Table Detection via Probability Optimization
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Lucene in Action (In Action series)
Lucene in Action (In Action series)
Hi-index | 0.01 |
In this paper we present a picture search engine for life science literature and show how it can be used to improve literature preselection. This preselection is needed as a way to compensate for the vast amounts of literature that are available. While searching for DNA binding sites for example, we wanted to add the results of specific experiments (DNAse I footprint and EMSA) to our database. The preselection via abstract search was very unspecific (150 000 hits), but by looking for paper with images concerning the experiments, we could improve precision immensely. They are displayed like hits in a search engine, allowing easy and quick quality assessment without having to read through the whole paper. The images are found by their annotation in the paper: the figure caption. To identify that, we analyse the layout of the paper: the position of the image and the surrounding text.