Statistical Analysis of Bibliographic Strings for Constructing an Integrated Document Space
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Bibliographic attribute extraction from erroneous references based on a statistical model
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Quality enhancement in information extraction from scanned documents
Proceedings of the 2006 ACM symposium on Document engineering
An approximate multi-word matching algorithm for robust document retrieval
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Automatic metadata extraction from museum specimen labels
DCMI '08 Proceedings of the 2008 International Conference on Dublin Core and Metadata Applications
A statistical model for flexible string similarity
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An effective access mechanism to digital interview archives
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
This paper proposes a text recognition error model called the dual variable length output hidden Markov model (DVHMM) and gives a parameter estimation algorithm based on the EM algorithm. Although existing probabilistic error models are limited to substitution (1,1), insertion (1,0), and deletion (0,1) errors, the DVHMM can handle error patterns of any pair (i, j) of lengths including substitution, insertion, and deletion.