Posterior Simulation
in Countable Mixture Models for Large Datasets
Subharup Guha
Department of Statistics
University of Missouri-Columbia
Friday, October 9, 2009, 12:30–1:30 pm
GEMS classroom, 3rd Floor in
Shriner's Building
Coffee, tea, and cookies will be provided
Abstract
Mixture models, or convex combinations of a countable number of probability distributions, offer an elegant framework for inference when the population of interest can be subdivided into latent clusters having random characteristics that are heterogeneous between, but homogenous within, the clusters. Traditionally, the different kinds of mixture models have been motivated and analyzed from very different perspectives, and their common characteristics have not been fully appreciated. The inferential techniques developed for these models usually necessitate heavy computational burdens that make them difficult, if not impossible, to apply to the massive data sets increasingly encountered in real world studies.
In this talk, I will introduce a flexible class of models called generalized Polya urn schemes. Many common mixture models, such as finite mixtures, hidden Markov models and Dirichlet processes, are obtained as special cases of this class. An investigation of the theoretical properties of generalized Polya urn schemes offers new insight into asymptotics that form the basis of cost-effective MCMC strategies for very large datasets. These MCMC techniques have the advantage that they provide inferences from the exact posterior of interest and are applicable to different mixture models. The versatility and impressive gains of the methodology are demonstrated by simulation studies and by a nonparametric Bayesian analysis of survival data on pancreatic cancer consisting of 10,459 individuals.