**Fall 2019**

All colloquia will be held at 4PM in AUST 344, unless otherwise noted. Coffee will be served at 3:30PM in AUST 326.

**Information about past colloquia is available here.**

## Date |
## Speaker |
## Title |
## Location |

Wednesday, August 28 | Yichuan Zhao, Georgia State University | Rank-based Estimating Equation With Non-ignorable Missing Responses | 11AM in AUST 313. Coffee at 10:30AM in AUST 326. |

Wednesday, September 4 | Tianying Wang, Columbia University | Integrated Quantile Rank Test (iQRAT) For Gene-level Associations In Sequencing Studies | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Monday, September 9 | Michael Lavine, US Army Research Office | Suboptimal Is The Best | 3:35PM in AUST 163. Coffee at 3:00PM in AUST 326. |

Wednesday, September 11 | Ivair Ramos Silva, Federal University, Brazil | On The Correspondence Between Frequentist And Bayesian Tests | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, September 18 | Patrick J. Cantwell, U.S. Census Bureau | Statistical Methods at the U.S. Census Bureau: From Simple Statistical Theory To Complex Practical ApplicationRecipient of the 2019 UConn Statistics Department Distinguished Alumni Award |
4PM in Gentry Bldg, Rm. 131. Coffee at 3:15PM-3:45PM in AUST 326. |

Wednesday, September 25 | Suman Majumdar, Department of Statistics, University of Connecticut | On Asymptotic Standard Normality Of The Two Sample Pivot | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, October 2 | Eric Loken, Department of Educational Psychology, University of Connecticut | Statistical Intuitions And The Reproducibility Crisis In Science | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, October 9 | Forrest Crawford, Yale University | Causal Inference Under Spillover And Contagion: Structural Versus Agnostic Methods |
4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, October 16 | Hui Zou, University of Minnesota | A Nearly Condition-free Fast Algorithm For Gaussian Graphical Model Recovery | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, October 23 | Min Shu, University of Connecticut | Real-Time Prediction of Bitcoin Bubble Crashes | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, October 30 | Guanyu Hu, University of Connecticut | JOINT UCONN/UMASS STATISTICS COLLOQUIUMBayesian Spatial Homogeneity Pursuit of Income Distribution |
4PM in UMASS LGRT Building, Rm. 1623. Pizza will be served after the colloquium. |

Friday, November 1 | Liqun Wang, Department of Statistics, University of Manitoba | Variable Selection And Estimation In Generalized Linear Models With Measurement Error | 11AM in AUST 344. Coffee at 10:30AM in AUST 326. |

Wednesday, November 6 | Julio Castrillon, Boston University | Large Scale Kriging: A High Performance Multi-Level Computational Mathematics Approach | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, November 13 | William Evan Johnson, Boston University | Addressing Unwanted Heterogeneity In Genomic Data: Applications In RNA-sequencing And Prediction | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Wednesday, November 20 | Moulinath Banerjee, University of Michigan | Communication-efficient Integrative Regression In High Dimensions | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

Friday, November 22 | Brian Macdonald, Director of Sports Analytics, ESPN | A Bayesian Hierarchical Regression-based Metric For NBA Players | 11AM in AUST 344. Coffee at 10:30PM in AUST 326. |

Wednesday, December 4 | Subhashis Ghoshal, North Carolina State University | Posterior Contraction and Credible Sets For Filaments Of Regression Functions | 4PM in AUST 344. Coffee at 3:30PM in AUST 326. |

## Yichuan Zhao; Georgia State University

### Rank-based estimating equation with non-ignorable missing responses

#### August 28, 2019 at 11AM in AUST 313

In this talk, a general regression model with responses missing not at random is considered. From a rank based estimating equation, a rank based estimator of the regression parameter is derived. Based on this estimator's asymptotic normality property, a consistent sandwich estimator of its corresponding asymptotic covariance matrix is obtained. In order to overcome the over-coverage issue of the normal approximation procedure, the empirical likelihood based on the rank-based gradient function is defined, and its asymptotic distribution is established. Extensive simulation experiments under different settings of error distributions with different response probabilities are considered, and the simulation results show that the proposed empirical likelihood approach has better performance in terms of coverage probability and average length of confidence intervals for the regression parameters compared with the normal approximation approach and its least-squares counterpart. A data example is provided to illustrate the proposed methods.

## Tianying Wang; Columbia University

### Integrated Quantile Rank Test (iQRAT) for gene-level associations in Sequencing Studies

#### September 4, 2019 at 4PM in AUST 344

Sequence-based association studies often evaluate the group-wise effects of rare and common genetic variants within a gene on a phenotype of interest. Many such approaches have been proposed, such as the widely used burden and sequence kernel association tests. These approaches focus on identifying genetic effects on the phenotypic mean. As the genetic associations can be complex, we propose here an efficient rank test to investigate the genetic effects across the entire distribution of a phenotype. The proposed test generalizes the classical quantile-specific rank-score test, by integrating the rank score test statistics over quantile levels while incorporating Cauchy combination test scheme and Fisher's method to maximize the power. We show that the resulting test complements the mean-based analysis and improves efficiency and robustness. Using simulations studies and real Metabochip data on lipid traits, we investigated the performance of the new test in comparison with the burden tests and sequence kernel association tests in multiple scenarios.

## Michael Lavine, US Army Research Office

### Suboptimal is the best

#### September 9, 2019

Many statistics problems are framed as optimization. That is, we write down a target function f (θ) and find the input value θ∗ that maximizes it. Our thesis is that we are often better served by finding the entire set of θ’s that come close to maximizing f viz; Θ∗ ≡ {θ : f (θ) ≥ f (θ∗) − E}. This talk will

Explain why (briefly);

Show a few examples of what can be gained by finding Θ∗; and

Show one possible approach to finding Θ∗.

## Ivair Ramos Silva; Federal University, Brazil

### On the Correspondence between Frequentist and Bayesian Tests

#### September 11, 2019

The confrontation between adepts of the Bayesian and the frequentist schools has endured for decades. However, a reconciling theory for hypothesis testing is emerging. This presentation is inspired in the work of Silva (2018), who shows that one can always calibrate Bayesian and frequentist tests in order to present the same decision rule for any hypothesis test problem.

## Patrick J. Cantwell; U.S. Census Bureau

**Recipient of the 2019 UConn Statistics Department Distinguished Alumni Award**

### Statistical Methods at the U.S. Census Bureau: From Simple Statistical Theory to Complex Practical Application

#### September 18, 2019

The U.S. Census Bureau has employed dozens of statistics graduates from the University of Connecticut over the years. At the Bureau, staff develop complex theoretical models and implement them in the course of our work on surveys and censuses. However, we often begin with simple ideas and find interesting–sometimes complex–applications to solve problems. In this presentation, we address three questions. For each, we briefly describe a statistical application based on simple statistical concepts.

The questions: (1) In a time when data intruders have access to sophisticated software and huge databases of personal information, how can we ensure the confidentiality of individuals’ responses to the census and our surveys? The randomized response method suggests ways to guarantee confidentiality in the presence of any external threats. (2) Can we measure how well the U.S. decennial census “covers” the population of the United States? A statistical procedure developed in the 19th Century provides the starting point and became the topic of a controversial Supreme Court case. (3) How can we design a survey that produces high quality estimates of the current unemployment rate as well as the change in the rate from the previous month? Practical considerations and basic concepts of statistical covariance provide guidance on effective survey designs and estimation procedures to precisely measure the unemployment rate, a major indicator and driver of the stock market.

## Suman Majumdar; Department of Statistics, University of Connecticut

### ON ASYMPTOTIC STANDARD NORMALITY OF THE TWO SAMPLE PIVOT

#### September 25, 2019

The large sample solution to the problem of comparing the means of two (possibly heteroscedastic) populations, based on two random samples from the populations, hinges on the pivot underpinning the construction of the confidence interval and the test statistic being asymptotically standard Normal. We regularly use this well-known solution if the two samples are independent and the sample sizes are large. However, to establish the asymptotic standard Normality of the two sample pivot, existing results in the literature seem to assume, above and beyond the cross sample independence of the two samples, that the ratio of the sample sizes converges to a finite positive number. restriction on This the asymptotic behavior of the ratio of the sample sizes is impossible to verify in practical applications and carries the risk of rendering the theoretical justification of the large sample approximation invalid even in moderately unbalanced designs. Our results show that neither the restriction on the asymptotic behavior of the ratio of the sample sizes nor the assumption of cross sample independence is necessary for the asymptotic standard Normality of the two sample pivot. Convergence of the joint distribution of the standardized sample means to a spherically symmetric distribution on the plane, which has to be the bivariate standard Normal distribution, implies the asymptotic standard Normality of the two sample pivot, with the passage of the sample sizes to infinity being completely unrestricted. Finally, the two random samples we work with can be considered to be a truncation of an underlying infinite sequence of random vectors, with truncation in each coordinate occurring at a different stage. As long as this infinite sequence consists of independent (not necessarily identically distributed) elements, Cesàro convergence of the sequence of cross sample correlation coefficients to zero is equivalent to both the asymptotic standard Normality of the two sample pivot and the asymptotic bivariate standard Normality of the standardized sample means.

## Eric Loken; Department of Educational Psychology, University of Connecticut

### Statistical Intuitions and the Reproducibility Crisis in Science

#### October 2, 2019

Science is responding well to the so-called reproducibility crisis with positive improvements in methodology and transparency. Another area for improvement is awareness of statistical issues impacting inference. We explore how some problematic intuitions about measurement, statistical power, multiple analyses, and levels of analysis can affect the interpretation of research results, perhaps leading to mistaken claims.

Eric Loken is in the Neag School of Education. He studies advanced statistical models including hierarchical models, measurement models, factor and mixture models, and their applications in health and education research. He works extensively in educational measurement with applications to large scale testing. Recent work has addressed issues surrounding statistical inference, and the relationship to failures to replicate research results.

## Forest W. Crawford; Biostatistics, Statistics & Data Science Operations

Ecology & Evolutionary Biology, Yale University

### Causal Inference Under Spillover and Contagion:

Structural Versus Agnostic Methods

#### October 9, 2019

Two competing paradigms dominate statistical and econometric approaches to estimating the effects of interventions in interconnected/interacting groups under spillover or interference between experimental units. “Mechanistic” or “structural” models capture dynamic features of the process by which outcomes are generated, permitting inferences with real-world interpretations and detailed predictions. “Agnostic”, “design-based”, or “reduced form” approaches, often based on notions of randomization, refrain from specifying the full joint distribution of the data, and provide inferences that are robust to model mis-specification. Statisticians, economists, epidemiologists, and other scientists often disagree about which of these paradigms is superior for studies of interventions among potentially interacting individuals, with competing claims about model realism, bias, and credibility of inferences. In this presentation, I review methods for estimating the causal effect of an individualistic treatment under spillover, with special attention to the case of contagion, whereby units can transmit their outcome to others in a way that depends on their treatment. I define a formal structural model of contagion, and ask what causal features agnostic or reduced-form estimates recover. I exhibit analytically and by simulation the circumstances under which coefficients in a marginal regression model imply an effect whose direction is opposite that of the true individualistic treatment effect. Furthermore, I show that widely recommended randomization designs and estimators may provide misleading inferences about the direct effect of an intervention when outcomes are contagious. These ideas are illustrated in three empirical examples: transmission of tuberculosis, product adoption, and peer recruitment in social networks.

Bio: Forrest W. Crawford PhD is Associate Professor, Department of Biostatistics, Department of Statistics & Data Science, Yale School of Management (Operations), and Department of Ecology & Evolutionary Biology, Yale University. He is affiliated with the Center for Interdisciplinary Research on AIDS, the Institute for Network Science, the Computational Biology and Bioinformatics program, and the Public Health Modeling concentration. He is the recipient of the NIH Director’s New Innovator Award and a Yale Center for Clinical Investigation Scholar Award. His research interests include causal inference, networks, graphs, stochastic processes, and optimization for applications in epidemiology, public health, and social science.

## Hui Zou; School of Statistics, University of Minnesota

### A nearly condition-free fast algorithm for Gaussian graphical model recovery

#### October 16, 2019

Many methods have been proposed for estimating Gaussian graphical model. The most popular ones are the graphical lasso and neighborhood selection, because the two are computational very efficient and have some theoretical guarantees. However, their theory for graph recovery requires some very stringent structure assumptions (a.k.a. the irrepresentable condition). We argue that replacing the lasso penalty in these two methods with a non-convex penalty does not fundamentally remove the theoretical limitation, because another structure condition is required. As an alternative, we propose a new algorithm for graph recovery that is very fast and easy to implement and enjoys strong theoretical properties under basic sparsity assumptions.

## Min Shu; Department of Statistics, University of Connecticut

### Real-Time Prediction of Bitcoin Bubble Crashes

#### October 23, 2019

In the past decade, Bitcoin has become an emerging asset class well known to most people because of their extraordinary return potential in phases of extreme price growth and their unpredictable massive crashes. Due to the sharp change and large volatility of Bitcoin price, the conventional financial bubble detecting methods do not provide an effective identification of Bitcoin bubbles. We propose the adaptive multilevel time series detection methodology based on the Log Periodic Power Law Singularity (LPPLS) model to effectively predict the bubble crashes of Bitcoin. Our analysis shows that the LPPLS confidence indicator based on this new method is an outstanding instrument in effectively detecting the bubbles and accurately forecasting the bubble crashes, even if a bubble exists in a short time. In addition, we discover that the short-term LPPLS confidence indicator greatly affected by the extreme fluctuations of Bitcoin price can provide some useful insights into the bubble status on a shorter time scale, and the long-term LPPLS confidence indicator has a stable performance in terms of effectively monitoring the bubble status on a longer time scale. The adaptive multilevel time series detection methodology can provide real-time detection of bubbles and advanced forecast of crashes to warn of the imminent risk in not only the cryptocurrency market but also other financial markets. Joint work with Prof. Wei Zhu, The State University of New York at Stony Brook.

## **Joint UCONN/UMASS STATISTICS COLLOQUIUM**

Guanyu Hu, Postdoctoral Researcher; Department of Statistics, University of Connecticut

### Bayesian Spatial Homogeneity Pursuit of Income Distribution

#### October 30, 2019

Income distribution describes how a nation’s total wealth is distributed amongst its population. The Lorenz curve is a well-known functional representation of income distribution. Knowing the spatial homogeneity of income distribution among different regions is important for regional economics researchers. In the context of spatial functional data clustering, an efficient Bayesian method to capture the homogeneity of spatial functional data that is based on Markov random field constrained mixture of finite mixture is proposed for Lorenz curve of different regions. The posterior is sampled through an efficient Markov chain Monte Carlo (MCMC) algorithm to simultaneously infer the posterior distribution on the number of clusters, and the clustering configuration of spatial functional data. The extensive simulation studies are carried out to show the effectiveness of our proposed methods. The usefulness of our model is illustrated by an application to state level income distribution from the American Community Survey Public Use Microdata Sample (PUMS) data. This is the joint work with Junxian Geng, Yishu Xue, and Huiyan Sang.

## Liqun Wang; Department of Statistics, University of Manitoba

### Variable selection and estimation in generalized linear models with measurement error

#### November 1, 2019

We study the variable selection and estimation problems in linear and generalized linear models when some of the predictors are measured with error. We demonstrate how measurement error impacts the selection results and propose regularized instrumental variable methods to correct for the measurement error effects. The proposed methods are consistent in selection and estimation and we derive their asymptotic distributions under general conditions. We also investigate the performances of the methods through Monte Carlo simulations and compare them with the naive method that ignores measurement error. Finally, the proposed method is applied to a real dataset. This is a joint work with Lin Xue.

## Julio E. Castrillon; Mathematics and Statistics department, Boston University

### Large Scale Kriging: A High Performance Multi-Level Computational Mathematics Approach

#### November 6, 2019

Large scale kriging problems usually become numerically expensive and unstable to solve as the number of observations are increased. In this talk we introduce techniques from Computational Applied Mathematics (CAM), Partial Differential Equations (PDEs), and High Performance Computing (HPC) to efficiently estimate the covariance function parameters and compute the best unbiased predictor with high accuracy. Our approach is based on multi-level spaces that have been successful for solving PDEs. The first advantage is that the estimation problem is decoupled and the covariance parameters are efficiently and accurately solved. In addition, the covariance matrix of the multi-level spaces exhibit fast decay and is much better conditioned than the original covariance matrix. Furthermore, we show that the prediction problem can be remapped into a numerically stable form without any loss of accuracy. We demonstrate our approach on test problems of up to 512,000 observations with a Matern covariance function and flexible placements of the observations. Many of these test examples are numerically unstable and hard to solve.

## W. Evan Johnson, PhD, Associate Professor of Medicine & Biostatistics

Associate Chief, Division of Computational Biomedicine, Boston University School of Medicine

### Addressing unwanted heterogeneity in genomic data: applications in RNA-sequencing and prediction

#### November 13, 2019

The presence of batch effects in genomic data may unfavorably impact a broad set of applications, including differential expression detection and the accuracy of genomic prediction models. In practice, batch effects are usually addressed by specifically designed software such as ComBat, which merge the data from different batches, estimate batch effects and remove them from the data. However, these established methods are not sufficient to address all challenges and scenarios in batch effect adjustment, including RNA-sequencing, single RNA-sequencing, prediction problems, and when there is a need for one batch to serve as a reference (e.g. biomarker training). This presentation will discuss these applications and present novel methods for improved handling of batch effects in these contexts. Software tools and solutions will also be presented, and methods/tools will be illustrated using examples from tuberculosis and cancer research.

## Moulinath Banerjee; Department of Statistics, University of Michigan

### COMMUNICATION-EFFICIENT INTEGRATIVE REGRESSION IN HIGH DIMENSIONS

#### November 20, 2019

We consider the task of meta-analysis in high-dimensional settings in which the data sources we wish to integrate are similar, but non-identical. To borrow strength across such heterogeneous data sources, we introduce a global parameter, based on robustness considerations, that remains sparse even in the presence of outlier data sources. We also propose a one-shot estimator of the global parameter that preserves the anonymity of the data sources and converges at a rate that depends on the size of the combined dataset. Finally, we demonstrate the benefits of our approach on a large-scale drug treatment dataset.

This is joint work with Subha Maity and Yuekai Sun.

## Brian Macdonald; Director of Sports Analytics, ESPN

### A Bayesian hierarchical regression-based metric for NBA players

#### November 22, 2019

We present a Bayesian hierarchical regression model that estimates the value of box score statistics and player coefficients simultaneously, and provides an estimate of a player’s contribution to his team’s on-court performance. We discuss how our approach differs from other regression-based metrics, provide visualizations of those differences over time as a way to highlight the characteristics of each, and discuss how this approach could be used in hockey, soccer, football, or eSports.

Brian Macdonald is currently the Director of Sports Analytics in the Stats & Information Group at ESPN. He was previously the Director of Hockey Analytics with the Florida Panthers Hockey Club, and an Associate Professor in the Department of Mathematical Sciences at West Point. He received a Bachelor of Science in Electrical Engineering from Lafayette College, Easton, PA, and a Master of Arts and a Ph.D. in Mathematics from Johns Hopkins University, Baltimore, MD.

## Subhashis Ghosal; Distinguished Professor, Department of Statistics, North Carolina State University

### Posterior Contraction and Credible Sets for Filaments of Regression Functions

#### December 4, 2019

The filament of a smooth function f consists of local maximizers of f when moving in a certain direction. The filament is an important geometrical feature of the surface of the graph of a function. It is also considered as an important lower dimensional summary in analyzing multivariate data. There have been some recent theoretical studies on estimating filaments of a density function using a nonparametric kernel density estimator. In this talk, we consider a Bayesian approach and concentrate on the nonparametric regression problem. We study the posterior contraction rates for filaments using a finite random series of tensor products of B-splines prior on the regression function.

Compared with the kernel method, this has the advantage that the bias can be better controlled when the function is smoother, which allows obtaining better rates. Under an isotropic Holder smoothness condition, we obtain the posterior contraction rate for the filament under two different metrics — a distance of separation along an integral curve, and the Hausdorff distance between sets. Moreover, we construct credible sets for the filament having optimal size with sufficient frequentist coverage. We study the performance of our proposed method through a simulation study and apply on a dataset about California earthquakes to assess the fault-line of the maximum local earthquake intensity.

Based on joint work with my former graduate student, Dr. Wei Li, Assistant Professor, Syracuse University, New York.