Randomized algorithms
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
I/O-efficient techniques for computing pagerank
Proceedings of the eleventh international conference on Information and knowledge management
Approximating Aggregate Queries about Web Pages via Random Walks
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ANF: a fast and scalable tool for data mining in massive graphs
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Node similarity in networked information spaces
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
The link prediction problem for social networks
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Sic transit gloria telae: towards an understanding of the web's decay
Proceedings of the 13th international conference on World Wide Web
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
A scalable randomized method to compute link-based similarity rank on the web graph
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Hyperlink analysis on the world wide web
Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
To randomize or not to randomize: space optimal summaries for hyperlink analysis
Proceedings of the 15th international conference on World Wide Web
LinkClus: efficient clustering via heterogeneous semantic links
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Personalized query expansion for the web
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting splogs via temporal dynamics using self-similarity analysis
ACM Transactions on the Web (TWEB)
People search: Searching people sharing similar interests from the Web
Journal of the American Society for Information Science and Technology
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Imagination: Exploiting Link Analysis for Accurate Image Annotation
Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Accuracy estimate and optimization techniques for SimRank computation
Proceedings of the VLDB Endowment
Sponsored ad-based similarity: an approach to mining collective advertiser intelligence
Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising
An Adaptive Method for the Efficient Similarity Calculation
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Using Link-Based Content Analysis to Measure Document Similarity Effectively
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Calculating Similarity Efficiently in a Small World
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
P-Rank: a comprehensive structural similarity measure over information networks
Proceedings of the 18th ACM conference on Information and knowledge management
Accuracy estimate and optimization techniques for SimRank computation
The VLDB Journal — The International Journal on Very Large Data Bases
Fast computation of SimRank for static and dynamic information networks
Proceedings of the 13th International Conference on Extending Database Technology
Web mediators for accessible browsing
ERCIM'06 Proceedings of the 9th conference on User interfaces for all
Exploring the power of heuristics and links in multi-relational data mining
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Parallel SimRank computation on large graphs with iterative aggregation
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
Adaptive combination of tag and link-based user similarity in flickr
Proceedings of the international conference on Multimedia
Taming computational complexity: efficient and parallel simrank optimizations on undirected graphs
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Link proximity analysis: clustering websites by examining link proximity
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
A fast two-stage algorithm for computing SimRank and its extensions
WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Axiomatic ranking of network role similarity
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Pairwise similarity calculation of information networks
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
ASAP: towards accurate, stable and accelerative penetrating-rank estimation on large graphs
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Finding information nebula over large networks
Proceedings of the 20th ACM international conference on Information and knowledge management
MFCRank: a web ranking algorithm based on correlation of multiple features
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
A space and time efficient algorithm for SimRank computation
World Wide Web
Communications of the ACM
Delta-SimRank computing on MapReduce
Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
On the efficiency of estimating penetrating rank on large graphs
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
E-rank: A Structural-Based Similarity Measure in Social Networks
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Scalable and axiomatic ranking of network role similarity
ACM Transactions on Knowledge Discovery from Data (TKDD) - Casin special issue
Efficient simrank-based similarity join over large graphs
Proceedings of the VLDB Endowment
Hi-index | 0.02 |
To exploit the similarity information hidden in the hyperlink structure of the web, this paper introduces algorithms scalable to graphs with billions of vertices on a distributed architecture. The similarity of multi-step neighborhoods of vertices are numerically evaluated by similarity functions including SimRank [20], a recursive refinement of cocitation; PSimRank, a novel variant with better theoretical characteristics; and the Jaccard coefficient, extended to multi-step neighborhoods. Our methods are presented in a general framework of Monte Carlo similarity search algorithms that precompute an index database of random fingerprints, and at query time, similarities are estimated from the fingerprints. The performance and quality of the methods were tested on the Stanford Webbase [19] graph of 80M pages by comparing our scores to similarities extracted from the ODP directory [26]. Our experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than single-step neighborhoods.