The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
New relations between similarity measures for vectors based on vector norms
Journal of the American Society for Information Science and Technology
Distance based fast hierarchical clustering method for large datasets
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
TI-DBSCAN: clustering with DBSCAN by means of the triangle inequality
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
A neighborhood-based clustering by means of the triangle inequality
IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Bounds on lengths of real valued vectors similar with regard to the tanimoto similarity
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search Among Real Valued Vectors
Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday
Hi-index | 0.00 |
The cosine and Tanimoto similarity measures are often and successfully applied in classification, clustering and ranking in chemistry, biology, information retrieval, and text mining. A basic operation in such tasks is identification of neighbors. This operation becomes critical for large high dimensional data. The usage of the triangle inequality property was recently offered to alleviate this problem in the case of applying a distance metric. The triangle inequality holds for the Tanimoto dissimilarity, which functionally determines the Tanimoto similarity, provided the underlying data have a form of vectors with binary non-negative values of attributes. Unfortunately, the triangle inequality holds neither for the cosine similarity measure nor for its corresponding dissimilarity measure. However, in this paper, we propose how to use the triangle inequality property and/or bounds on lengths of neighbor vectors to efficiently determine non-negative binary vectors that are similar with regard to the cosine similarity measure.