Structure/attribute computation of similarities between nodes of a RDF graph with application to linked data clustering

Authors:
Hadi Khosravi-Farsani;Mohammadali Nematbaksh;George Lausen
Affiliations:
Computer Engineering Department, University of Isfahan, Isfahan, Iran;Computer Engineering Department, University of Isfahan, Isfahan, Iran;Informatik Department, Albert-Ludwigs, Freiburg, Germany
Venue:
Intelligent Data Analysis
Year:
2013

Citing 13
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SALSA: the stochastic approach for link-structure analysis

ACM Transactions on Information Systems (TOIS)
On Clustering Using Random Walks

FST TCS '01 Proceedings of the 21st Conference on Foundations of Software Technology and Theoretical Computer Science
Algorithms for estimating relative importance in networks

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation

IEEE Transactions on Knowledge and Data Engineering
Efficient aggregation for graph summarization

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Accuracy estimate and optimization techniques for SimRank computation

Proceedings of the VLDB Endowment
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
P-Rank: a comprehensive structural similarity measure over information networks

Proceedings of the 18th ACM conference on Information and knowledge management
Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity estimation between interconnected objects appears in many real-world applications and many domain-related measures have been proposed. This work proposes a new perspective on specifying the similarity between resources in linked data, and in general for vertices of a directed and attributed graph. More precisely, it is based on the combination of structural properties of a graph and attribute/value of its vertices. We compute similarities between any pair of nodes using an extension of Jaccard measure, which has the nice property of increasing when the number of matching attribute/value of those resources increase. Highly similar vertices are treated as one single node in the next step which is called a CGraph. Nodes of a CGraph represent highly similar resources in the first step and links between resources are generalized to links between clusters. We propose an extension of the structural algorithm, i.e. CRank to merge highly similar nodes in the next step. The suggested model is evaluated in a clustering procedure on our standard dataset where class label of each resource is estimated and compared with the ground-truth class label. Experimental results show that our model outperforms other clustering algorithms in terms of precision and recall rate.