Prediction in financial markets: The case for small disjuncts

Authors:
Vasant Dhar
Affiliations:
New York University, New York, NY
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST)
Year:
2011

Citing 11
Cited 3

The nature of statistical learning theory

The nature of statistical learning theory
Data mining criteria for tree-based regression and classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Random Forests

Machine Learning
The Problem with Noise and Small Disjuncts

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Quantitative Study of Small Disjuncts

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
Reinforcement learning for optimized trade execution

ICML '06 Proceedings of the 23rd international conference on Machine learning
Strategic risk taking: a framework for risk management

Strategic risk taking: a framework for risk management
Computational Complexity: A Modern Approach

Computational Complexity: A Modern Approach
Further experimental evidence against the utility of Occam's razor

Journal of Artificial Intelligence Research
Learning to trade via direct reinforcement

IEEE Transactions on Neural Networks

Data science and prediction

Communications of the ACM
Online portfolio selection: A survey

ACM Computing Surveys (CSUR)
Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news

Decision Support Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

Predictive models in regression and classification problems typically have a single model that covers most, if not all, cases in the data. At the opposite end of the spectrum is a collection of models, each of which covers a very small subset of the decision space. These are referred to as “small disjuncts.” The trade-offs between the two types of models have been well documented. Single models, especially linear ones, are easy to interpret and explain. In contrast, small disjuncts do not provides as clean or as simple an interpretation of the data, and have been shown by several researchers to be responsible for a disproportionately large number of errors when applied to out-of-sample data. This research provides a counterpoint, demonstrating that a portfolio of “simple” small disjuncts provides a credible model for financial market prediction, a problem with a high degree of noise. A related novel contribution of this article is a simple method for measuring the “yield” of a learning system, which is the percentage of in-sample performance that the learned model can be expected to realize on out-of-sample data. Curiously, such a measure is missing from the literature on regression learning algorithms. Pragmatically, the results suggest that for problems characterized by a high degree of noise and lack of a stable knowledge base it makes sense to reconstruct the portfolio of small rules periodically.