Automatic Feature Selection with Applications to Script Identification of Degraded Documents

Authors:
Vitaly Ablavsky;Mark R. Stevens
Affiliations:
-;-
Venue:
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Year:
2003

Citing 8
Cited 3

On the Recognition of Printed Characters of Any Font and Size

IEEE Transactions on Pattern Analysis and Machine Intelligence
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Automatic Script Identification From Document Images Using Cluster-Based Templates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Rotation Invariant Texture Features and Their Use in Automatic Script Identification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Digital Image Processing

Digital Image Processing
Document page decomposition by the bounding-box project

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

Script Identification Using Steerable Gabor Filters

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Word level multi-script identification

Pattern Recognition Letters
An optimally weighted fuzzy k-NN algorithm

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current approaches to script identification rely onhand-selected features and often require processing a significantpart of the document to achieve reliable identification.We present an approach that applies a large pool ofimage features to a small training sample and uses subsetfeature selection techniques to automatically select a subsetwith the most discriminating power. At run time we usea classifier coupled with an evidence accumulation engineto report a script label once a preset likelihood thresholdhas been reached. We apply the system to a diverse corpusof printed Russian and English documents that suffer fromcommon degradation problems. Our validation studyshows promising results both in terms of the script identificationaccuracy and the ability to identify script on thescale of individual words and text lines.