Communications of the ACM
Inductive databases and condensed representations for data mining (extended abstract)
ILPS '97 Proceedings of the 1997 international symposium on Logic programming
The budgeted maximum coverage problem
Information Processing Letters
A Microeconomic View of Data Mining
Data Mining and Knowledge Discovery
A perspective on inductive databases
ACM SIGKDD Explorations Newsletter
Theoretical frameworks for data mining
ACM SIGKDD Explorations Newsletter
Interestingness of frequent itemsets using Bayesian networks as background knowledge
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
On data mining, compression, and Kolmogorov complexity
Data Mining and Knowledge Discovery
Assessing data mining results via swap randomization
ACM Transactions on Knowledge Discovery from Data (TKDD)
MINI: Mining Informative Non-redundant Itemsets
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Randomization Techniques for Data Mining Methods
ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
Maximum entropy based significance of itemsets
Knowledge and Information Systems
Tell me something I don't know: randomization strategies for iterative data mining
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards a general framework for data mining
KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
A framework for mining interesting pattern sets
ACM SIGKDD Explorations Newsletter
Maximum entropy models and subjective interestingness: an application to tiles in binary databases
Data Mining and Knowledge Discovery
An architecture for component-based design of representative-based clustering algorithms
Data & Knowledge Engineering
Summarizing data succinctly with the most informative itemsets
ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Knowledge discovery interestingness measures based on unexpectedness
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Mining high coherent association rules with consideration of support measure
Expert Systems with Applications: An International Journal
A statistical significance testing approach to mining the most informative set of patterns
Data Mining and Knowledge Discovery
Interesting pattern mining in multi-relational data
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
We formalize the data mining process as a process of information exchange, defined by the following key components. The data miner's state of mind is modeled as a probability distribution, called the background distribution, which represents the uncertainty and misconceptions the data miner has about the data. This model initially incorporates any prior (possibly incorrect) beliefs a data miner has about the data. During the data mining process, properties of the data (to which we refer as patterns) are revealed to the data miner, either in batch, one by one, or even interactively. This acquisition of information in the data mining process is formalized by updates to the background distribution to account for the presence of the found patterns. The proposed framework can be motivated using concepts from information theory and game theory. Understanding it from this perspective, it is easy to see how it can be extended to more sophisticated settings, e.g. where patterns are probabilistic functions of the data (thus allowing one to account for noise and errors in the data mining process, and allowing one to study data mining techniques based on subsampling the data). The framework then models the data mining process using concepts from information geometry, and I-projections in particular. The framework can be used to help in designing new data mining algorithms that maximize the efficiency of the information exchange from the algorithm to the data miner.