Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Algorithms
Experiments with random projections for machine learning
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Very sparse random projections
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
Hierarchical Clustering of Massive, High Dimensional Data Sets by Exploiting Ultrametric Embedding
SIAM Journal on Scientific Computing
Clustering
Survey of clustering algorithms
IEEE Transactions on Neural Networks
A method for determining ontology-based semantic relevance
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
In this paper, we analyze the preservation of original semantic similarity among objects when dimensional reduction is applied on the original data source and a further clustering process is performed on dimensionally reduced data. An experiment is designed to test Baire, or longest common prefix ultrametric, and K-Means when prior random projection is applied. A data matrix extracted from a cultural heritage database has been prepared for the experiment. Given that the random projection produces a vector with components ranging on the interval [0, 1], clusters are obtained at different precision levels. Next, the mean semantic similarity of clusters is calculated using a modified version of the Jaccard index. Our findings show that semantics is difficult to preserve by these methods. However, a Student's hypothesis test on mean similarity indicates that Baire clusters objects are semantically better than K-Means when we increase the digit precision, but paying an increasing cost for orphan clustered objects. Despite this cost, it is argued that the ultrametric technique provides an efficient process to detect semantic homogeneity on the original data space.