Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Knowledge Discovery and Measures of Interest
Knowledge Discovery and Measures of Interest
Scalable Algorithms for Association Mining
IEEE Transactions on Knowledge and Data Engineering
Alternative Interest Measures for Mining Associations in Databases
IEEE Transactions on Knowledge and Data Engineering
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CoMine: Efficient Mining of Correlated Patterns
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Using Information-Theoretic Measures to Assess Association Rule Interestingness
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data Mining and Knowledge Discovery
Finding highly correlated pairs efficiently with powerful pruning
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
TOP-COP: Mining TOP-K Strongly Correlated Pairs in Large Databases
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Mining top-k strongly correlated item pairs without minimum correlation threshold
International Journal of Knowledge-based and Intelligent Engineering Systems
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Volatile correlation computation: a checkpoint view
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Association Mining in Large Databases: A Re-examination of Its Measures
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Semantic clustering of XML documents
ACM Transactions on Information Systems (TOIS)
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Combining ontological profiles with context in information retrieval
Data & Knowledge Engineering
Incremental all pairs similarity search for varying similarity thresholds
Proceedings of the 3rd Workshop on Social Network Mining and Analysis
UFOme: An ontology mapping system with strategy prediction capabilities
Data & Knowledge Engineering
Probabilistic models for answer-ranking in multilingual question-answering
ACM Transactions on Information Systems (TOIS)
Efficient mining of top correlated patterns based on null-invariant measures
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
An architecture for component-based design of representative-based clustering algorithms
Data & Knowledge Engineering
An approach for selecting seed URLs of focused crawler based on user-interest ontology
Applied Soft Computing
Editorial: A topic-specific crawling strategy based on semantics similarity
Data & Knowledge Engineering
Hi-index | 0.00 |
Recent years have witnessed an increased interest in computing cosine similarity in many application domains. Most previous studies require the specification of a minimum similarity threshold to perform the cosine similarity computation. However, it is usually difficult for users to provide an appropriate threshold in practice. Instead, in this paper, we propose to search top-K strongly correlated pairs of objects as measured by the cosine similarity. Specifically, we first identify the monotone property of an upper bound of the cosine measure and exploit a diagonal traversal strategy for developing a TOP-DATA algorithm. In addition, we observe that a diagonal traversal strategy usually leads to more I/O costs. Therefore, we develop a max-first traversal strategy and propose a TOP-MATA algorithm. A theoretical analysis shows that TOP-MATA has the advantages of saving the computations for false-positive item pairs and can significantly reduce I/O costs. Finally, experimental results demonstrate the computational efficiencies of both TOP-DATA and TOP-MATA algorithms. Also, we show that TOP-MATA is particularly scalable for large-scale data sets with a large number of items.