Type: Journal Publication
Abstract: Summary: Data clustering is typically considered a subjective process, which makes it problematic. For instance, how does one make statistical inferences based on clustering? The matter is di0erent with pattern classi1cation, for which two fundamental characteristics can be stated: (1) the error of a classi1er can be estimated using “test data,” and (2) a classi1er can be learned using “training data.” This paper presents a probabilistic theory of clustering, including both learning (training) and error estimation (testing). The theory is based on operators on random labeled point processes. It includes an error criterion in the context of random point sets and representation of the Bayes (optimal) cluster operator for a given random labeled point process. Training is illustrated using a nearest-neighbor approach, and trained cluster operators are compared to several classical clustering algorithms.
Cited as: Brun Marcel, Dougherty Edward Russell, "A Probabilistic Theory of Clustering", Pattern Recognition, Vol. 37, No. 5, pp. 917-925, 2004