Clustering by Deterministic Annealing and the Quest for the Global Best Prof. Kenneth Rose University of California, Santa Barbara The deterministic annealing approach to clustering and its various extensions have demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of important applications including compression, estimation, pattern recognition and classification, and statistical regression. It evolved from cross-fertilization of information theory, statistical physics and pattern recognition. The talk reviews its derivation within a probabilistic framework from basic information theoretic principles. The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is gradually relaxed (in direct analogy to annealing in statistical physics) so as to avoid many shallow local minima. Effectively, it replaces the stochastic procedure of simulated annealing with a deterministic optimization of the appropriate expectation functionals (free energy in the physical analogy). Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon's rate-distortion function, and the annealing temperature is inversely proportional to the slope of the rate-distortion curve. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including tree-structured and multi-stage clustering, as well as various decision trees, multilayer perceptrons, radial basis functions, mixtures of experts, and hidden Markov models. The talk concludes with a brief (as time permits...) discussion of extensions of the method that are currently under investigation, with emphasis on information-theoretic contributions to search and retrieval in high dimensional databases: approximate nearest neighbor search as special case of rate-distortion theory (where the tradeoff is between search complexity/time and quality of the retrieved set) and the corresponding performance bounds, the importance of coding with decoder side-information (the Wyner-Ziv problem), search in correlated data streams and the problems of predictive vector quantization and multiple description coding.