Numerical recipes in C: the art of scientific computing
Numerical recipes in C: the art of scientific computing
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Efficient sampling strategies for relational database operations
ICDT Selected papers of the 4th international conference on Database theory
Experiments on multistrategy learning by meta-learning
CIKM '93 Proceedings of the second international conference on Information and knowledge management
KOSI—an integrated system for discovering functional relations from databases
Journal of Intelligent Information Systems
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
A tutorial on learning with Bayesian networks
Proceedings of the NATO Advanced Study Institute on Learning in graphical models
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Relational Data Mining
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Bottom-Up Association Rule Mining in Relational Databases
Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
Using Correspondence Analysis to Combine Classifiers
Machine Learning
ECML '93 Proceedings of the European Conference on Machine Learning
Simple Random Sampling from Relational Databases
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Efficient Algorithms for Identifying Relevant Features
Efficient Algorithms for Identifying Relevant Features
Filtering Multi-Instance Problems to Reduce Dimensionality in Relational Learning
Journal of Intelligent Information Systems
Aggregation-based feature invention and relational concept classes
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning relational probability trees
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to the Special Issue on Meta-Learning
Machine Learning
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Pruning Social Networks Using Structural Properties and Descriptive Attributes
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Efficient Classification across Multiple Database Relations: A CrossMine Approach
IEEE Transactions on Knowledge and Data Engineering
Mining relational data through correlation-based multiple view validation
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Spatial associative classification: propositional vs structural approach
Journal of Intelligent Information Systems
Logical and Relational Learning: From ILP to MRDM (Cognitive Technologies)
Logical and Relational Learning: From ILP to MRDM (Cognitive Technologies)
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Semantic sampling of existing databases through informative Armstrong databases
Information Systems
Integrating Naïve Bayes and FOIL
The Journal of Machine Learning Research
Margin-based first-order rule learning
Machine Learning
A Method for Multi-relational Classification Using Single and Multi-feature Aggregation Functions
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Pruning Relations for Substructure Discovery of Multi-relational Databases
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Multirelational classification: a multiple view approach
Knowledge and Information Systems
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases
ACM Transactions on Knowledge Discovery from Data (TKDD)
One in a million: picking the right patterns
Knowledge and Information Systems
Issues in stacked generalization
Journal of Artificial Intelligence Research
View learning for statistical relational learning: with an application to mammography
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Top-down induction of first-order logical decision trees
Artificial Intelligence
Fast learning of relational kernels
Machine Learning
Learning with many irrelevant features
AAAI'91 Proceedings of the ninth National conference on Artificial intelligence - Volume 2
A toolbox for learning from relational data with propositional and multi-instance learners
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Hi-index | 0.00 |
Multirelational classification aims to discover patterns across multiple interlinked tables (relations) in a relational database. In many large organizations, such a database often spans numerous departments and/or subdivisions, which are involved in different aspects of the enterprise such as customer profiling, fraud detection, inventory management, financial management, and so on. When considering classification, different phases of the knowledge discovery process are affected by economic utility. For instance, in the data preprocessing process, one must consider the cost associated with acquiring, cleaning, and transforming large volumes of data. When training and testing the data mining models, one has to consider the impact of the data size on the running time of the learning algorithm. In order to address these utility-based issues, the paper presents an approach to create a pruned database for multirelational classification, while minimizing predictive performance loss on the final model. Our method identifies a set of strongly uncorrelated subgraphs from the original database schema, to use for training, and discards all others. The experiments performed show that our strategy is able to, without sacrificing predictive accuracy, significantly reduce the size of the databases, in terms of the number of relations, tuples, and attributes.The approach prunes the sizes of databases by as much as 94 %. Such reduction also results in decreasing computational cost of the learning process. The method improves the multirelational learning algorithms' execution time by as much as 80 %. In particular, our results demonstrate that one may build an accurate model with only a small subset of the provided database.