Address block segmentation using ensemble-clustering techniques

Authors:
Mustafa Idrissi;Leon J. M. Rothkrantz
Affiliations:
-;Delft University of Technology
Venue:
CompSysTech '08 Proceedings of the 9th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
Year:
2008

Citing 6
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Data clustering: a user’s dilemma

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the challenges to make the postal automation technology more accurate is to improve the accuracy of the address-block locating task. This task can be decomposed into: address-block segmentation, which involves segmenting the mail piece into different regions and address-block selection, which involves selecting the segmented region that satisfies the optimally destination address-block among all the segmented regions. In this paper we focused on the segmentation part of the address-block locating task. We investigated whether some clustering techniques and ensemble clustering techniques will help to improve the accuracy of address-block locating. For this a research-tool with several clustering methods as well as ensemble methods was developed. Very little a priori knowledge of the images is required. The implemented system has been evaluated on mail piece images captured live from real postal pieces at the postal sorting centers. The results of this approach will be described and illustrated with tests carried out on different images (parcels, magazines, postcard, etc...) where there are no fixed position for the address-block, postmarks and stamps. A ground-truth strategy is employed to evaluate the accuracy of segmentation. The agglomerative based clustering and their ensemble version achieved the best clustering results. However they suffer from high run-time. Therefore further development will be needed to achieve better computation time.