Algorithms for Graphics and Imag
Algorithms for Graphics and Imag
A Rational Design for a Weighted Finite-State Transducer Library
WIA '97 Revised Papers from the Second International Workshop on Implementing Automata
Multilingual text analysis for text-to-speech synthesis
Natural Language Engineering
Interpreting contact details out of e-mail signature blocks
Proceedings of the 21st international conference companion on World Wide Web
Hi-index | 0.00 |
The signature block is a common structured component found in e-mail messages. Accurate identification and analysis of signature blocks are important in many multimedia messaging and information retrieval applications such as email text-to-speech rendering. It is also a very challenging task, because signature blocks often appear in complex two-dimensional layouts which are guided only by loose conventions. Traditional text analysis methods designed to deal with sequential text cannot handle 2-dimensional structures, while the highly unconstrained nature of signature blocks makes the application of 2-dimensional grammars very difficult. In this paper we describe an algorithm for signature block analysis which combines two-dimensional structural segmentation with one-dimensional grammatical constraints. The information obtained from both geometrical and linguistic analysis are integrated in the form of weighted finite state transducers (WFST), and the final solution is the optimal interpretation under both constraints.