Foundations of statistical natural language processing
Foundations of statistical natural language processing
Recognition of Chinese business card
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Automatic Acquisition of Layout Knowledge for Understanding Business Cards
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Automatic Knowledge Acquisition for Spatial Document Interpretation
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
A System for Automatic Chinese Business Card Recognition
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Enhancing document structure analysis using visual analytics
Proceedings of the 2010 ACM Symposium on Applied Computing
Hi-index | 0.00 |
We present a document understanding system in which the arrangement of lines of text and block separators within a document are modeled by stochastic context free grammars. A grammar corresponds to a document genre; our system may be adapted to a new genre simply by replacing the input grammar. The system incorporates an optical character recognition system that outputs characters, their positions and font sizes. These features are combined to form a document representation of lines of text and separators. Lines of text are labeled as tokens using regular expression matching. The maximum likelihood parse of this stream of tokens and separators yields a functional labeling of the document lines. We describe business card and business letter applications.