A sequence labeling method using syntactical and textual patterns for record linkage
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Hi-index | 0.00 |
We propose a stochastic context-free grammar for extracting information from scanned document images. The grammar is designed to disambiguate layout analysis and utilize both layout and text features. We applied this grammar to the problem of extracting bibliographic information from scanned academic papers and found that it can accurately extract information.