Abstract: The literature on Gaussian graphical models (GGMs) contains two equally rich and equally significant domains of research efforts and interests. The first research domain relates to the problem of graph determination. That is, the underlying graph is unknown and needs to be inferred from the data. The second research domain dominates the applications in spatial epidemiology. In this context GGMs are typically referred to as Gaussian Markov random fields (GMRFs). Here the underlying graph is assumed to be known: the vertices correspond to geographical areas, while the edges are associated with areas that are considered to be neighbors of each other (e.g., if they share a border). In this talk we introduce multi-way Gaussian graphical models that unify the statistical approaches to inference for spatiotemporal epidemiology with the literature on general GGMs. The novelty of the proposed work consists of the addition of the G-Wishart distribution to the substantial collection of statistical tools used to model multivariate areal data. As opposed to fixed graphs that describe geography, there is an inherent uncertainty related to graph determination across the other dimensions of the data. Our new class of methods for spatial epidemiology allow the simultaneous use of GGMs to represent known spatial dependencies and to determine unknown dependencies in the other dimensions of the data. Joint work with Alex Lenkoski and Abel Rodriguez.
In 2006 Adrian Dobra joined the University of Washington as an Assistant Professor. After receiving his Ph.D. in Statistics from Carnegie Mellon University, he worked at Duke University as a postdoctoral fellow at the National Institute of Statistical Sciences and the Statistical and Applied Mathematical Sciences Institute, and subsequently as a Research Assistant Professor in the Department of Statistics and the Department of Molecular Genetics and Microbiology. His methodological research is focused on graphical models, stochastic computing and multidimensional contingency tables. His applied research has been motivated by projects in the social sciences, genomics, disclosure limitation and spatial epidemiology. He has published 24 papers and 6 book chapters.
Abstract: Problems that require estimating high-dimensional matrices from noisy observations arise frequently in statistics and machine learning. Examples include dimensionality reduction methods (e.g., principal components and canonical correlation), collaborative filtering and matrix completion (e.g., Netflix and other recommender systems), multivariate regression, estimation of time-series models, and graphical model learning. When the sample size is less than the matrix dimensions, all of these problems are ill-posed, so that some type of structure is required in order to obtain interesting results.
In recent years, relaxations based on the nuclear norm and other types of convex matrix regularizers have become popular. By framing a broad class of problems as special cases of matrix regression, we present a single theoretical result that provides guarantees on the accuracy of such convex relaxations. Our general result can be specialized to obtain various non-asymptotic bounds, among them sharp rates for noisy forms of matrix completion, matrix compression, and matrix decomposition. In all of these cases, information-theoretic methods can be used to show that our rates are minimax-optimal, and thus cannot be substantially improved upon by any algorithm, regardless of computational complexity.
Based on joint works with Alekh Agarwal and Sahand Negahban.
Martin Wainwright is currently an associate professor at University of California at Berkeley, with a joint appointment between the Department of Statistics and the Department of Electrical Engineering and Computer Sciences. He received a Bachelor's degree in Mathematics from University of Waterloo, Canada, and Ph.D. degree in Electrical Engineering and Computer Science (EECS) from Massachusetts Institute of Technology (MIT). His research interests include machine learning, mathematical statistics, and information theory. He has been awarded an Alfred P. Sloan Foundation Fellowship, an NSF CAREER Award, the George M. Sprowls Prize for his dissertation research (EECS department, MIT), a Natural Sciences and Engineering Research Council of Canada 1967 Fellowship, an IEEE Signal Processing Society Best Paper Award in 2008, and several outstanding conference paper awards.
Abstract: Information technology has enabled collection of massive amounts of data in science, engineering, social science, finance and beyond. Extracting useful information from massive and high-dimensional data is the focus of today's statistical research and practice. After broad success of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity is being used as its proxy. With the virtues of both regularization and sparsity, sparse modeling methods (e.g., Lasso) has attracted much attention for theoretial research and for data modeling.
This talk discusses both theory and pratcice of sparse modeling. First we present some recent theoretical results on bounding L2-estimation error (when p>>n) for a class of M-estimation methods with decomposable penalities. As special cases, our results cover Lasso, L1-penalized GLMs, grouped Lasso, and low-rank sparse matrix estimation. Second we present on-going research on “topic-imaging” supported by an NSF-CDI grant. This project employs sparse logistic regression to derive a list of words (“topic-image”) that associate with a particular word (e.g. “Microsoft”) in, for example, New York Times articles. The validity of such a list is supported by human subject experiment results when compared with some other methods.
Bio: Bin Yu is Chancellor's Professor in the departments of Statistics and of Electrical Engineering & Computer Science at UC Berkeley. She is currently the chair of department of Statistics, and a founding co-director of the Microsoft Lab on Statistics and Information Technology at Peking University, China. Her Ph.D. is in Statistics from UC Berkeley, and she has published over 80 papers in a wide range of research areas including empirical process theory, information theory (MDL), MCMC methods, signal processing, machine learning, high dimensional data inference (boosting and Lasso and sparse modeling in general), bioinformatics, and remotes sensing. Her current research interests include statistical machine learning for high dimensional data and solving data problems from remote sensing, neuroscience, and text documents.
She was a 2006 Guggenheim Fellow, and is a Fellow of AAAS, IEEE, IMS (Institute of Mathematical Statistics) and ASA (American Statistical Association). She is a co-chair of National Scientific Committee of SAMSI and on the Board of Mathematical Sciences and Applications of the National Academy of Sciences in the US.