On Strong Consistency of Model Selection in Classification

Authors:
J. Suzuki
Affiliations:
Dept. of Math., Osaka Univ.
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 2

Parallell interacting MCMC for learning of topologies of graphical models

Data Mining and Knowledge Discovery
Derivations of normalized mutual information in binary classifications

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1

Quantified Score

Hi-index	754.84

Visualization

Abstract

This paper considers model selection in classification. In many applications such as pattern recognition, probabilistic inference using a Bayesian network, prediction of the next in a sequence based on a Markov chain, the conditional probability P(Y=y|X=x) of class yisinY given attribute value xisinX is utilized. By model we mean the equivalence relation in X: for x,x'isinXx~x'hArrP(Y=y|X=x)=P(Y=y|X=x'), forall yisinY. By classification we mean the number of such equivalence classes is finite. We estimate the model from n samples zn=(xi,yi)i=1 n isin(XtimesY)n, using information criteria in the form empirical entropy H plus penalty term (k/2)dn (the model such that H+(k/2)dn is minimized is the estimated model), where k is the number of independent parameters in the model, and {dn}n=1 infin is a real nonnegative sequence such that lim supndn/n=0. For autoregressive processes, although the definitions of H and k are different, it is known that the estimated model almost surely coincides with the true model as nrarrinfin if {dn}n=1 infin>{2loglogn}n=1 infin, and that it does not if {dn}n=1 infin<{2loglogn}n=1 infin (Hannan and Quinn). The problem whether the same property is true for classification was open. This paper solves the problem in the affirmative