Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
IEEE Transactions on Pattern Analysis and Machine Intelligence
The nature of statistical learning theory
The nature of statistical learning theory
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Statistical and neural classifiers: an integrated approach to design
Statistical and neural classifiers: an integrated approach to design
On Fusers that Perform Better than Best Sensor
IEEE Transactions on Pattern Analysis and Machine Intelligence
Complexity Measures of Supervised Classification Problems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Success and Failure Factors in Software Reuse
IEEE Transactions on Software Engineering
Machine Learning
Knowledge Acquisition form Examples Vis Multiple Models
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
More Success and Failure Factors in Software Reuse
IEEE Transactions on Software Engineering
Comments on "More Success and Failure Factors in Software Reuse"
IEEE Transactions on Software Engineering
Extracting symbolic rules from trained neural network ensembles
AI Communications - Special issue on Artificial intelligence advances in China
The business case for software reuse
IBM Systems Journal
Proceedings of the 28th international conference on Software engineering
NeC4.5: Neural Ensemble Based C4.5
IEEE Transactions on Knowledge and Data Engineering
On biases in estimating multi-valued attributes
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Generation of comprehensible hypotheses from gene expression data
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
No free lunch theorems for optimization
IEEE Transactions on Evolutionary Computation
Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble
IEEE Transactions on Information Technology in Biomedicine
Hi-index | 0.02 |
A serious problem encountered by machine learning and data mining techniques in software engineering is the lack of sufficient data. For example, there are only 24 examples in the current largest data set on software reuse. In this paper, a recently proposed machine learning algorithm is modified for mining extremely small data sets. This algorithm works in a twice-learning style. In detail, a random forest is trained from the original data set at first. Then, virtual examples are generated from the random forest and used to train a single decision tree. In contrast to the numerous discrepancies between the empirical data and expert opinions reported by previous research, our mining practice shows that the empirical data are actually consistent with expert opinions. Copyright © 2008 John Wiley & Sons, Ltd.