The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
APPROXIMATE: A Query Processor that Produces Monotonically Improving Approximate Answers
IEEE Transactions on Knowledge and Data Engineering
Mining and visualizing recommendation spaces for elliptic PDEs with continuous attributes
ACM Transactions on Mathematical Software (TOMS) - Special issue in honor of John Rice's 65th birthday
User profile as a basis for an electronic statistical consulting system
Communications of the AIS
Data Mining by Means of Binary Representation: A Model for Similarity and Clustering
Information Systems Frontiers
Volume Data Mining Using 3D Field Topology Analysis
IEEE Computer Graphics and Applications
Enhancing the Apriori Algorithm for Frequent Set Counting
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Parallel Monte Carlo algorithms for information retrieval
Mathematics and Computers in Simulation - Special issue: 3rd IMACS seminar on Monte Carlo methods - MCM 2001
Summary Structures for Frequency Queries on Large Transaction Sets
DCC '00 Proceedings of the Conference on Data Compression
Technology and knowledge: bridging a "generating" gap
Information and Management
Mining and visualizing recommendation spaces for PDE solvers: the continuous attributes case
Computational science, mathematics and software
Distributed approximate mining of frequent patterns
Proceedings of the 2005 ACM symposium on Applied computing
T-map: a topological approach to visual exploration of time-varying volume data
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Effective browsing and serendipitous discovery with an experience-infused browser
Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
Hi-index | 4.10 |
The idea of unsupervised learning from basic facts (axioms) or from data has fascinated researchers for decades. Knowledge discovery engines try to extract general inferences from facts or training data. Statistical methods take a more structured approach, attempting to quantify data by known and intuitively understood models. The problem of gleaning knowledge from existing data sources poses a significant paradigm shift from these traditional approaches. The size, noise, diversity, dimensionality, and distributed nature of typical data sets make even formal problem specification difficult. Moreover, you typically do not have control over data generation. This lack of control opens up a Pandora's box filled with issues such as overfitting, limited coverage, and missing/incorrect data with high dimensionality. Once specified, solution techniques must deal with complexity, scalability (to meaningful data sizes), and presentation. This entire process is where data mining makes its transition from serendipity to science