Top-K Correlation Sub-graph Search in Graph Databases
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Mining correlated subgraphs in graph databases
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Scaling up top-K cosine similarity search
Data & Knowledge Engineering
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.01 |
Recently, there has been considerable interest in computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different characteristics. To this end, we propose an alternative task: mining the top-k strongly correlated pairs. In this paper, we identify a 2-D monotone property of an upper bound of Pearson's correlation coefficient and develop an efficient algorithm, called TOP-COP to exploit this property to effectively prune many pairs even without computing their correlation coefficients. Our experimental results show that the TOP-COP algorithm can be orders of magnitude faster than brute-force alternatives for mining the top-k strongly correlated pairs.