Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
The complex dynamics of collaborative tagging
Proceedings of the 16th international conference on World Wide Web
Python for Scientific Computing
Computing in Science and Engineering
Tag-based social interest discovery
Proceedings of the 17th international conference on World Wide Web
Integrating Folksonomies with the Semantic Web
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
PINTS: peer-to-peer infrastructure for tagging systems
IPTPS'08 Proceedings of the 7th international conference on Peer-to-peer systems
Improving the exploration of tag spaces using automated tag clustering
ICWE'11 Proceedings of the 11th international conference on Web engineering
Hi-index | 0.00 |
Users of Web tag spaces, e.g., Flickr, find it difficult to get adequate search results due to syntactic and semantic tag variations. In most approaches that address this problem, the cosine similarity between tags plays a major role. However, the use of this similarity introduces a scalability problem as the number of similarities that need to be computed grows quadratically with the number of tags. In this paper, we propose a novel algorithm that filters insignificant cosine similarities in linear time complexity with respect to the number of tags. Our approach shows a significant reduction in the number of calculations, which makes it possible to process larger tag data sets than ever before. To evaluate our approach, we used a data set containing 51 million pictures and 112 million tag annotations from Flickr.