Finding Interesting Associations without Support Pruning

Authors:
Edith Cohen;Mayur Datar;Shinji Fujiwara;Aristides Gionis;Piotr Indyk;Rajeev Motwani;Jeffrey D. Ullman;Cheng Yang
Affiliations:
-;-;-;-;-;-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2001

Citing 15
Cited 75

Using collaborative filtering to weave an information tapestry

Communications of the ACM - Special issue on information filtering
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Randomized algorithms

Randomized algorithms
Building a scalable and accurate copy detection mechanism

Proceedings of the first ACM international conference on Digital libraries
Recommender systems

Communications of the ACM
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Scalable Techniques for Mining Causal Structures

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997

Deriving High Confidence Rules from Spatial Data Using Peano Count Trees

WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
Local and Global Methods in Data Mining: Basic Techniques and Open Problems

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
MAMBO: Discovering Association Rules Based on Conditional Independencies

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
A case for associative peer to peer overlays

ACM SIGCOMM Computer Communication Review
Unified descriptive language for association rules in data mining

Second international workshop on Intelligent systems design and application
Interpretations of Association Rules by Granular Computing

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining confident co-location rules without a support threshold

Proceedings of the 2003 ACM symposium on Applied computing
A graph model for E-commerce recommender systems

Journal of the American Society for Information Science and Technology
Efficient set joins on similarity predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Sliding window filtering: an efficient method for incremental mining on a time-variant database.

Information Systems
Automated support specification for efficient mining of interesting association rules

Journal of Information Science
Mining quantitative correlated patterns using an information-theoretic approach

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering based on similarity of subjects using integrated subject graph

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Finding highly correlated pairs efficiently with powerful pruning

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Maintaining stream statistics over multiscale sliding windows

ACM Transactions on Database Systems (TODS)
Association rules mining using heavy itemsets

Data & Knowledge Engineering
Associative search in peer to peer networks: Harnessing latent semantics

Computer Networks: The International Journal of Computer and Telecommunications Networking
Bottom-k sketches: better and more efficient estimation of aggregates

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Twain: Two-end association miner with precise frequent exhibition periods

ACM Transactions on Knowledge Discovery from Data (TKDD)
Summarizing data using bottom-k sketches

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Algorithms for clustering high dimensional and distributed data

Intelligent Data Analysis
Association-based similarity testing and its applications

Intelligent Data Analysis
Compressing large boolean matrices using reordering techniques

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Mining unexpected multidimensional rules

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
A scalable pattern mining approach to web graph compression with communities

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
On discovery of soft associations with "most" fuzzy quantifier for item promotion applications

Information Sciences: an International Journal
Correlated pattern mining in quantitative databases

ACM Transactions on Database Systems (TODS)
A probabilistic framework for fusing frame-based searches within a video copy detection system

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
SpotSigs: robust and efficient near duplicate detection in large web collections

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Waveprint: Efficient wavelet-based audio fingerprinting

Pattern Recognition
Learning to hash: forgiving hash functions and applications

Data Mining and Knowledge Discovery
Tighter estimation using bottom k sketches

Proceedings of the VLDB Endowment
Discovering data quality rules

Proceedings of the VLDB Endowment
A decision theoretic framework for analyzing binary hash-based content identification systems

Proceedings of the 8th ACM workshop on Digital rights management
Finding sporadic rules in the diagnosis of the Erythemato-Squamous diseases

Intelligent Data Analysis
Type-based categorization of relational attributes

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Fast error-tolerant search on very large texts

Proceedings of the 2009 ACM symposium on Applied Computing
On Optimal Rule Mining: A Framework and a Necessary and Sufficient Condition of Antimonotonicity

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Leveraging discarded samples for tighter estimation of multiple-set aggregates

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Event Correlations in Sensor Networks

ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Media Meets Semantic Web --- How the BBC Uses DBpedia and Linked Data to Make Connections

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Automatic accuracy assessment via hashing in multiple-source environment

Expert Systems with Applications: An International Journal
HARRA: fast iterative hashed record linkage for large-scale data collections

Proceedings of the 13th International Conference on Extending Database Technology
Summary queries for frequent itemsets mining

Journal of Systems and Software
Connection network and optimization of interest metric for one-to-one marketing

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Mining frequent instances on workflows

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Novel alarm correlation analysis system based on association rules mining in telecommunication networks

Information Sciences: an International Journal
Generalizing prefix filtering to improve set similarity joins

Information Systems
An efficient approach to clustering real-estate listings

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
Enhancing graph database indexing by suffix tree structure

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Product portfolio identification with data mining based on multi-objective GA

Journal of Intelligent Manufacturing
On dense pattern mining in graph streams

Proceedings of the VLDB Endowment
pq-hash: an efficient method for approximate XML joins

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Theory and applications of b-bit minwise hashing

Communications of the ACM
SizeSpotSigs: an effective deduplicate algorithm considering the size of page content

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Efficient duplicate detection on cloud using a new signature scheme

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Mining top-k regular-frequent itemsets using database partitioning and support estimation

Expert Systems with Applications: An International Journal
Finding sporadic rules using apriori-inverse

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Redundant bit vectors for quickly searching high-dimensional regions

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
On approximation algorithms for data mining applications

Efficient Approximation and Online Algorithms
Valency based weighted association rule mining

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Weighted association rule mining using particle swarm optimization

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Weighted association rule mining via a graph based connectivity model

Information Sciences: an International Journal
Optimonotone Measures For Optimal Rule Discovery

Computational Intelligence
Automatic Item Weight Generation for Pattern Mining and its Application

International Journal of Data Warehousing and Mining
SkyDiver: a framework for skyline diversification

Proceedings of the 16th International Conference on Extending Database Technology
STRIP: stream learning of influence probabilities

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Bottom-k and priority sampling, set similarity and subset sums with minimal independence

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
BruteSuppression: a size reduction method for Apriori rule sets

Journal of Intelligent Information Systems
Learning theory analysis for association rules and sequential event prediction

The Journal of Machine Learning Research
Efficient estimation for high similarities using odd sketches

Proceedings of the 23rd international conference on World wide web
A local fingerprinting approach for audio copy detection

Signal Processing
Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)

ACM Transactions on Computation Theory (TOCT)
Editorial: data mining in electronic commerce - support vs. confidence

Journal of Theoretical and Applied Electronic Commerce Research

Quantified Score

Hi-index	0.02

Visualization

Abstract

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar web documents, clustering, and collaborative filtering, where the rules of interest have comparatively few instances in the data. In these cases, we must look for highly correlated items, or possibly even causal relationships between infrequent items. We develop a family of algorithms for solving this problem, employing a combination of random sampling and hashing techniques. We provide analysis of the algorithms developed and conduct experiments on real and synthetic data to obtain a comparative performance analysis.