Type: Journal Publication
Abstract: Classifying a patient based on disease type, treatment prognosis, survivability, or other such criteria has become a major focus of genomics and proteomics. From the perspective of the general population of a particular kind of cell, one would like a classifier that applies to the whole population; however, it is often the case that the population is sufficiently structurally diverse that a satisfactory classifier cannot be designed from available sample data. In such a circumstance, it can be useful to identify cellular contexts within which a disease can be reliably diagnosed, which in effect means that one would like to find classifiers that apply to different sub-populations within the overall population. Using a model-based approach, this paper quantifies the effect of contexts on classification performance as a function of the classifier used and the sample size. The advantage of a model-based approach is that we can vary the contextual confusion as a function of the model parameters, thereby allowing us to compare the classification performance in terms of the degree of discriminatory confusion caused by the contexts. We consider five popular classifiers: linear discriminant analysis, three nearest neighbor, linear support vector machine, polynomial support vector machine, and Boosting. We contrast the case where classification is done with a single classifier without discriminating between the contexts to the case where there are context markers that facilitate context separation before classifier design. We observe that little can be done if there is high contextual confusion, but when the contextual confusion is low, context separation can be beneficial, the benefit depending on the classifier.
Cited as: A. Choudhary, J. Hua, M. Bittner and E. R. Dougherty, "The Effects of Population Contexts on Classifier Performance", Journal of Biological Systems vol. 16, 495-517. 2008