## Fall 2016** **

Coffee will be served at 3:30PM in room 326 for each Wednesday colloquium. TAs are responsible for set-up according to the schedule.

Information about past colloquia is available here.

For searchable listing of past colloquia use our search tool.

Colloquium is organized by Professor Xiaojing Wang.

## Marcos Prates; Universidade Federal de Minas Gerais

### Where does geography live? A projection approach for spatial confounding

#### August 17, 2016

Spatial confounding between the spatial random effects and fixed effects covariates has been recently discovered and showed that it may bring misleading interpretation to the model results. Solutions to alleviate this problem are based on decomposing the spatial random effect and fitting a restricted spatial regression. In this paper, we propose a different approach: a transformation of the geographic space to ensure that the unobserved spatial random effect added to the regression is orthogonal to the fixed effects covariates. Our approach, named SPOCK, has the additional benefit of providing a fast and simple computational method to estimate the parameters. Furthermore, it does not constrain the distribution class assumed for the spatial error term. A simulation study and a real data analysis are presented to better understand the advantages of the new method in comparison with the existing ones.

Joint work with Renato Martins Assunção and Erica Castilho Rodrigues.

## Taeryon Choi; Korea University

### Bayesian shape restricted regression models using Gaussian processes priors

#### September 6, 2016

In this talk, we propose a Bayesian method for shape-restricted regression using a spectral analysis of Gaussian process priors for the regression function. The proposed model directly enforces shape-restrictions on the derivatives of the regression function. The smoothing prior distribution for the spectral coefficients incorporates hyper-parameters that control the smoothness of the function and the tradeoff between the data and the prior distribution. We contrast our approach with existing Bayesian shape-restricted regression models for dealing with regression functions with monotonicity and concavity. We also propose models for U-shaped and S-shaped functions that facilitate the estimation of the extrema and inflection points. We modify the basic model with a slab and spike prior that improves model when the true function is on the boundary of the constraint space. The posterior distributions of the proposed models are consistent. We also examine Bayesian hypothesis testing for shape restrictions and discuss its potentials and limitations. Further, we illustrate the empirical performance of the proposed models with synthetic and real data and compare them with existing Bayesian methods.

## Panpan Zhang; University of Connecticut

### Joint distribution of nodes of different outdegrees and the degree profile in preferential dynamic attachment circuits

#### September 14, 2016

We investigate the joint distribution of nodes of different outdegrees and the degree profile in preferential dynamic attachment circuits. In particular, we study the asymptotic distribution of the number of the nodes of outdegree 0 (terminal nodes) and outdegree 1 in a very large circuit. The expectation and variance of the number of those two types of nodes are both linear with respect to the age of the circuit. We show that the number of nodes of outdegree 0 and 1 asymptotically follows a bivariate normal distribution via martingale methods. We also study the exact distribution of the degree of a node as the circuit ages via a series of Pólya-Eggenberger urn models with “hiccups” in between. The exact expectation and variance of the degree of nodes are determined by recurrence methods. Phase transitions of these degrees are discussed briefly. This is a joint work with Hosam M. Mahmoud.

## Robert E. Kass; Carnegie Mellon University

### Statistical Thinking in Neuroscience

#### September 21, 2016

Experimenters are typically adept at applying standard statistical techniques, while computational neuroscientists are capable of formulating mathematically sophisticated data analytic methods to attack novel problems in data analysis. Yet, in many situations, statisticians proceed differently than those without formal training in statistics. What is different about the way statisticians approach problems? I will give you my thoughts on this subject, and will illustrate with examples, including the problem of neural synchrony detection across a network of interacting spiking neurons. I will conclude with some related comments on scientific reproducibility, illustrating them with an experiment in which brain signals were used to run a robotic device.

## Yazhen Wang; University of Wisconsin-Madison

### Quantum Computation and Statistics

#### September 28, 2016

Quantum computation and quantum information are of great current interest in fields such as computer science, physics, engineering, chemistry and mathematical sciences. They will likely lead to a new wave of technological innovations in communication, computation and cryptography. As the theory of quantum physics is fundamentally stochastic, randomness and uncertainty are deeply rooted in quantum computation and quantum information. Thus statistics can play an important role in quantum computation, which in turn may offer great potential to revolutionize statistical computing and inferences. This talk will first give a brief introduction on quantum computation and quantum information and then present my recent work on (i) quantum tomography and its connection with matrix completion and compressed sensing, (ii) annealing based quantum computing and its relationship with Markov chain Monte Carlo simulations, (iii) statistical analysis of quantum annealing for large scale quantum computing data.

## Neal Thomas; Pfizer

### Using meta-analyses to guide statistical methodology for clinical dose response studies

#### October 5, 2016

Two meta-analyses of dose response data will be presented. One is based on data from a single sponsor (Pfizer), and one based on published industry-wide data. The dosing designs actually implemented are described, along with common patterns of dose response. The analyses demonstrate that a single concise parametric model describes most clinical dose response data well. It also provides an empirical basis for prior distributions for some of the model parameters that are useful for both the design and analysis of future studies. An example of how this information can be important when interpreting data will be given. The approach described is contrasted with much of current statistical practice, and its broader implications will be noted.

## Kung-Sik Chan; University of Iowa

### Inference for Threshold Diffusions

#### October 12, 2016

The threshold diffusion model assumes the underlying diffusion process to have a piece-wise linear drift term and a piece-wise smooth diffusion term, which is useful for analyzing nonlinear continuous-time processes. In practice, the functional form of the diffusion term is often unknown. We develop a quasi-likelihood approach for testing and estimating a threshold diffusion model, by employing a constant working diffusion term, which amounts to a least squares approach. Large-sample properties of the proposed methods are derived under mild regularity conditions. Unlike the discrete-time case, the threshold estimate admits a closed-form asymptotic distribution. We apply the threshold model to examine the nonlinearity in the term structure of a long time series of US interest rates.

## Kun Chen; University of Connecticut

### On Large-scale Predictive Modeling of Mixed and Incomplete Outcomes

#### October 19, 2016

Multivariate outcomes together with multivariate features of possibly high dimensionality have been routinely produced from various fields. In many real-world problems, the collected outcomes are of mixed types, including continuous measurements, binary indicators and counts, and the data may also subject to substantial missing values. Regardless of their types, these mixed outcomes are often interrelated, representing diverse views of the same underlying data generation mechanism. As such, an integrative multivariate modeling can be beneficial. We develop a mixed-outcome reduced rank regression, which effectively enables information sharing among all the prediction tasks. Our approach integrates mixed and partially observed outcomes belonging to the exponential dispersion family, by assuming that all the outcomes are associated through a shared low-dimensional subspace spanned by the high-dimensional features. A general regularized estimation criterion is proposed, and a unified algorithm with convergence guarantee is developed for optimization. We establish non-asymptotic performance bound for the proposed estimators in the context of mixed outcomes from exponential family and under a general sampling scheme of missing. The effectiveness of our approach is demonstrated by simulation studies and an application on predicting health-related outcomes in longitudinal studies of aging. Other strategies for large-scale prediction, including sequential feature extraction and mixture modeling, will also be discussed.

This event will be held at UMass Amherst, in the Lederle Graduate Research Center (LGRT) room 1634. Tea will be held before the colloquium, and pizza served after.

## Sumona Mondal; Clarkson University

### Sample Size Determination for Power Analysis Using Hierarchical Designs

#### October 28, 2016

In this talk we will discuss how to determine sample sizes for hierarchical mixed-effects models. Both subject-level and site-level randomizations will be considered in this power analysis. We will show how the site-level randomization needs a special consideration to determine the sample sizes to achieve desired power. The impact of dropouts, number of measuring time-points, and variance components will be considered as well in this analysis. Results will be illustrated with a dataset from psychiatry.

Coffee will be served at 10:30 am in AUST 323B.

## Ben Shaby; Penn State University

### Spatial Extreme Value Analysis for Fire Risk Assessment

#### November 2, 2016

Wildfires have the potential to inflict huge losses of life, infrastructure, and habitat. I will describe two projects related to extreme fire risk in a particularly vulnerable region in California. In describing extreme fire conditions, the salient characteristic is that one or more relevant environmental variable is in the far tail of its distribution. One would like to understand the tail in order to make informed policy decisions regarding, for example, fire risk mitigation. One difficulty is that, by definition, few observations of rare events are available. Furthermore, extremes of environmental processes almost always manifest dependence in time, space, or both. Stochastic process models for analyzing such structures exist, but they are difficult to work with directly because they have intractable likelihoods. I will discuss alternative representations that build dependence in extremes using latent variables.

## Heping Zhang; Yale University School of Public Health

### Statistical Strategies in Analyzing Data with Unequal Prior Knowledge

#### November 9, 2016

The advent of technologies including high throughput genotyping and computer information technologies has produced ever large and diverse databases that are potentially information rich. This creates the need to develop statistical strategies that have a sound mathematical foundation and are computationally feasible and reliable. In statistics, we commonly deal with relationship between variables using correlation and regression models. With diverse databases, the quality of the variables may vary and we may know more about some variables than the others. I will present some ideas on how to conduct statistical inference with unequal prior knowledge. Specifically how do we define correlation between two sets of random variables conditional on a third set of random variables and how do we select predictors when we have information from sources other than the databases with raw data? I will address some mathematical and computational challenges in order to answer these questions. Analysis of real genomic data will be presented to support the proposed methods and highlight remaining challenges.

## David Banks; Duke University

### Statistical Issues with Agent-Based Models

#### November 18, 2016

Agent-based models have become an ubiquitous tool in many disciplines. But too little is known about their statistical properties. This talk reviews the work that has been done in this area, and describes two strategies for improving model fitting and inference. It also attempts to place agent-based modeling within the span of modern Bayesian inference.

Coffee will be served at 10:30 am in AUST 326.

## Nitis Mukhopadhyay; University of Connecticut

### Walking on a Thin Bridge Linking Teaching- Research-Teaching Excites and Rewards Me

#### November 30, 2016

A link between teaching and research excites me. My recent research on statistical inference, linear models, and applied probability has often originated from teaching. In this presentation, I will highlight some of those encounters. As time permits, I will touch upon selected topics involving covariance, correlation, independence, Student’s t-distribution, multivariate (non-) normality, sufficiency, information, and ancillarity.

## Jim Booth; Cornell University

### Table counting and exact conditional inference for contingency tables

#### December 7, 2016

I will review exact conditional inference in the context of loglinear models for contingency tables. The feasibility of exact inference often depends on the ability to enumerate a reference set determined by sufficient statistics which impose linear constraints on the contingency table counts. A double-saddlepoint approximation is proposed for determining the number tables with counts satisfying these linear constraints. Computation of the approximation involves fitting a generalized linear model for geometric responses which can be accomplished almost instantaneously using the iterated weighted least squares algorithm. The approximation is far superior to other analytical approximations that have been proposed, and is shown to be highly accurate in a range of examples, including some for which analytical approximations were previously unavailable. A similar approximation is proposed for tables consisting of only zeros and ones based on a logistic regression model. A higher order adjustment to the basic double saddlepoint further improves the accuracy of the approximation in almost all cases.