ACM Computing Surveys (CSUR)
Advances in Distributed and Parallel Knowledge Discovery
Advances in Distributed and Parallel Knowledge Discovery
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Data clustering: a user’s dilemma
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
One of the challenges to make the postal automation technology more accurate is to improve the accuracy of the address-block locating task. This task can be decomposed into: address-block segmentation, which involves segmenting the mail piece into different regions and address-block selection, which involves selecting the segmented region that satisfies the optimally destination address-block among all the segmented regions. In this paper we focused on the segmentation part of the address-block locating task. We investigated whether some clustering techniques and ensemble clustering techniques will help to improve the accuracy of address-block locating. For this a research-tool with several clustering methods as well as ensemble methods was developed. Very little a priori knowledge of the images is required. The implemented system has been evaluated on mail piece images captured live from real postal pieces at the postal sorting centers. The results of this approach will be described and illustrated with tests carried out on different images (parcels, magazines, postcard, etc...) where there are no fixed position for the address-block, postmarks and stamps. A ground-truth strategy is employed to evaluate the accuracy of segmentation. The agglomerative based clustering and their ensemble version achieved the best clustering results. However they suffer from high run-time. Therefore further development will be needed to achieve better computation time.