Type: Journal Publication
Abstract: Discrete classification problems are important in pattern recognition applications. The most often used discrete classification rule is the discrete histogram rule. In this letter we provide exact expressions for the correlation coefficient between the actual error and the resubstitution and leave-one-out cross-validation error estimators for the discrete histogram rule. We show with an example that correlations between actual and estimated errors are generally poor, and that in fact leave-one-out cross-validation can display negative correlation when sample sizes are small and classifier complexity is large. We observe that correlation decreases with increasing classifier complexity and increasing sample size does not necessarily produce an increase in correlation. The exact expressions given here can be computed reasonably fast for given sample size, dimensionality, and model parameters, which is useful because, as also illustrated in this letter, Monte-Carlo approximations of the correlation coefficient are generally poor, even at a large number of simulated data sets.
Cited as: Ulisses M. Braga-Neto and Edward R. Dougherty, "Exact Correlation between Actual and Estimated Errors in Discrete Classification", Published in: Journal Pattern Recognition Letters archive Volume 31 Issue 5, April, 2010 Pages 407-412