Algorithms for approximate string matching
Information and Control
On the Recognition of Printed Characters of Any Font and Size
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast parallel and serial approximate string matching
Journal of Algorithms
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Results of applying probabilistic IR to OCR text
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of model-based retrieval effectiveness with OCR text
ACM Transactions on Information Systems (TOIS)
Effects of OCR errors on ranking and feedback using the vector space model
Information Processing and Management: an International Journal
IEEE Transactions on Pattern Analysis and Machine Intelligence
New techniques for open-vocabulary spoken document retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Phonetic confusion matrix based spoken document retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Modern Information Retrieval
Algorithms on Trees and Graphs
Algorithms on Trees and Graphs
New and faster filters for multiple approximate string matching
Random Structures & Algorithms
Fuzzy Full-Text Searches in OCR Databases
ADL '95 Selected Papers from the Digital Libraries, Research and Technology Advances
Theoretical and Empirical Comparisons of Approximate String Matching Algorithms
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
A Metric Index for Approximate String Matching
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Bibliographic attribute extraction from erroneous references based on a statistical model
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
DVHMM: Variable Length Text Recognition Error Model
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Spoken document retrieval from call-center conversations
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Text Retrieval through Corrupted Queries
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Hi-index | 0.00 |
Document generation from low level data and its utilization is one of the most challenging tasks in document engineering. Word occurrence detection is a fundamental problem in the recognized document utilization obtained by a recognizer, such as OCR and speech recognition. Given a set of words, such as a dictionary, this paper proposes an efficient dynamic programming (DP) algorithm to find the occurrences of each word in a text. In this paper, the string similarity is measured by a statistical similarity model that enables a definition of the similarities in the character level as well as edit operation level. The proposed algorithm uses tree structures to measure similarities in order to avoid measuring similarities of the same substrings appearing in different parts of the text and words. The time complexity of the proposed algorithm is O(|W|⋅|S|⋅|Q|), where |W| (resp. |S|) denote the number of nodes in the trees representing the word set (resp. the text), and |Q| donotes the number of the states of the model used for string similarity. This paper shows the proposed algorithm is experimentally about six times faster than a naive DP algorithm.