A Parallel Hierarchical Agglomerative Clustering Technique for Billingual Corpora Based on Reduced Terms with Automatic Weight Optimization

Authors:
Rayner Alfred
Affiliations:
Center for Artificial Intelligence, Universiti Malaysia Sabah, Sabah, Malaysia 88999
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 2
Cited 0

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cross-lingual information retrieval using hidden Markov models

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multilingual corpora are becoming an essential resource for work in multilingual natural language processing. The aim of this paper is to investigate the effects of applying a clustering technique to parallel multilingual texts. It is interesting to look at the differences of the cluster mappings and the tree structures of the clusters. The effect of reducing the set of terms considered in clustering parallel corpora is also studied. After that, a genetic-based algorithm is applied to optimize the weights of terms considered in clustering the texts to classify unseen examples of documents. Specifically, the aim of this work is to introduce the tools necessary for this task and display a set of experimental results and issues which have become apparent.