Efficient determination of binary non-negative vector neighbors with regard to cosine similarity

Authors:
Marzena Kryszkiewicz
Affiliations:
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Venue:
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Year:
2012

Citing 5
Cited 2

The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
New relations between similarity measures for vectors based on vector norms

Journal of the American Society for Information Science and Technology
Distance based fast hierarchical clustering method for large datasets

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
TI-DBSCAN: clustering with DBSCAN by means of the triangle inequality

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
A neighborhood-based clustering by means of the triangle inequality

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning

Bounds on lengths of real valued vectors similar with regard to the tanimoto similarity

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search Among Real Valued Vectors

Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cosine and Tanimoto similarity measures are often and successfully applied in classification, clustering and ranking in chemistry, biology, information retrieval, and text mining. A basic operation in such tasks is identification of neighbors. This operation becomes critical for large high dimensional data. The usage of the triangle inequality property was recently offered to alleviate this problem in the case of applying a distance metric. The triangle inequality holds for the Tanimoto dissimilarity, which functionally determines the Tanimoto similarity, provided the underlying data have a form of vectors with binary non-negative values of attributes. Unfortunately, the triangle inequality holds neither for the cosine similarity measure nor for its corresponding dissimilarity measure. However, in this paper, we propose how to use the triangle inequality property and/or bounds on lengths of neighbor vectors to efficiently determine non-negative binary vectors that are similar with regard to the cosine similarity measure.