Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Performance analysis of pattern classifier combination by plurality voting
Pattern Recognition Letters
An adaptive k-nearest neighbor text categorization strategy
ACM Transactions on Asian Language Information Processing (TALIP)
Text classification with kernels on the multinomial manifold
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An optical character recognition approach to qualifying thresholding algorithms
Proceedings of the eighth ACM symposium on Document engineering
Hi-index | 0.00 |
To address cost and regulatory concerns, many businesses are converting paper-based elements of their workflows into fully electronic flows that use the content of the documents. Scanning the document contents into workflows, however, is a manual, error-prone, and costly process especially when the data extraction process requires high accuracy. These manual costs are a primary barrier to widespread adoption of distributed capture solutions for business critical workflows such as insurance claims, medical records, or loan applications. Software solutions using artificial intelligence and natural language processing techniques are emerging to address these needs, but each have their individual strengths and weaknesses, and none have demonstrated a high level of accuracy across the many unstructured document types included in these business critical workflows. This paper describes how to overcome many of these limitations by intelligently combining multiple approaches for document classification using meta-algorithmic design patterns. These patterns explore the error space in multiple engines, and provide improved and "emergent" results in comparison to voting schemes and to the output of any of the individual engines. This paper considers the results of the individual engines along with traditional combinatorial techniques such as voting, before describing prototype results for a variety of novel metaalgorithmic patterns that reduce individual document error rates by up to 13% and reduce system error rates by up to 38%.