**Fall 2017 **

All colloquia will be held at 4pm in AUST 105, unless otherwise noted. Coffee will be served at 3:30pm in room 326.

Information about past colloquia is available here.

Colloquium is organized by Professor Xiaojing Wang.

## Joseph Glaz; University of Connecticut

### Multiple Window Scan Statistics for Detecting a Local Change in Variance for Normal Data

#### September 6, 2017

In this talk research in the area of scan statistics will be reviewed. Recent results for detecting a local change in variance will be discussed for one dimensional normal data. Numerical results to evaluate the performance of these scan statistics will be presented. The two dimensional case will also be briefly discussed. Future research in this active area and application will be also mentioned.

## Zhiyi Chi; University of Connecticut

### Exact sampling for infinitely divisible distributions and Levy processes

#### September 13, 2017

Infinitely divisible (i.d.) distributions have many applications. Unfortunately, many of them are specified via sum of infinitely many jumps and have no closed-form expressions, making them difficult to sample exactly. I will show that for a rather wide range of i.d. distributions with finite variation, this difficulty can be overcome by utilizing an integral series expansion of their probability densities and rejection sampling.

If time permits, I will also briefly discusses exact sampling of first passage event of Levy processes. The idea is to embed a process into a “carrier” process whose first passage event can be sampled exactly and then extract the part belonging to the former from the data sampled for the carrier. This part will be mostly explained by pictures instead of technical formulas.

## Jungbin Hwang; Department of Economics, University of Connecticut

### Should We Go One Step Further?

An Accurate Comparison of One-step and Two-step Procedures in a Generalized Method of Moments Framework

#### September 20, 2017

According to the conventional asymptotic theory, the two-step Generalized Method of Moments (GMM) estimator and test perform as least as well as the one-step estimator and test in large samples. The conventional asymptotic theory, as elegant and convenient as it is, completely ignores the estimation uncertainty in the weighting matrix, and as a result it may not reflect finite sample situations well. In this paper, we employ the fixed-smoothing asymptotic theory that accounts for the estimation uncertainty, and compare the performance of the one-step and two-step procedures in this more accurate asymptotic framework. We show the two-step procedure outperforms the one-step procedure only when the benefit of using the optimal weighting matrix outweighs the cost of estimating it. This qualitative message applies to both the asymptotic variance comparison and power comparison of the associated tests. A Monte Carlo study lends support to our asymptotic results.

## Shuangge Ma; Department of Biostatistics, Yale University

### Integrating multidimensional omics data for cancer prognosis

#### September 27, 2017

Prognosis is of essential interest in cancer research. Multiple types of omics measurements – including mRNA gene expression, methylation, copy number variation, SNP, and others – have been implicated in cancer prognosis. The analysis of multidimensional omics data is challenging because of the high data dimensionality and, more importantly, because of the interconnections between different units of the same type of measurement and between

different types of omics measurements. In our study, we have developed novel regularization based methods, effectively integrated multidimensional data, and constructed prognosis models. It is shown that integrating multidimensional data can lead to biological discoveries missed by the analysis of one dimensional data and superior prognosis models.

## Haoda Fu; Biometrics and Advanced Analytics, Eli Lilly and Company

### Individualized Treatment Recommendation (ITR) for Survival Outcomes

#### October 4, 2017

ITR is a method to recommend treatment based on individual patient characteristics to maximize clinical benefit. During the past a few years, we have developed and published methods on this topic with various applications including comprehensive search algorithms, tree methods, benefit risk algorithm, multiple treatment & multiple ordinal treatment algorithms. In this talk, we propose a new ITR method to handle survival outcomes for multiple treatments. This new model enjoys the following practical and theoretical features:

- Instead of fitting the data, our method directly search the optimal treatment police which improve the efficiency
- To adjust censoring, we propose a doubly robust estimator. Our method only requires either censoring model or survival model is correct, but not both. When both are correct, our method enjoys better efficiency
- Our method handles multiple treatments with intuitive geometry explanations
- Our method is Fisher’s consistent even under either censoring model or survival model misspecification (but not both).

## Yang Li; School of Statistics, Renmin University of China

### Model Confidence Bounds for Variable Selection

#### October 6, 2017

In this article, we introduce the concept of model confidence bounds (MCBs) for variable selection in the context of nested models. Similarly to the endpoints in the familiar confidence interval for parameter estimation, the MCBs identify two nested models (upper and lower confidence bound models) containing the true model at a given level of confidence. Instead of trusting a single selected model obtained from a given model selection method, the MCBs proposes a group of nested models as candidates and the MCBs’ width and composition enable the practitioner to assess the overall model selection uncertainty. A new graphical tool — the model uncertainty curve (MUC) — is introduced to visualize the variability of model selection and to compare different model selection procedures. The MCBs methodology is implemented by a fast bootstrap algorithm that is shown to yield the correct asymptotic coverage under rather general conditions. Our Monte Carlo simulations and a real data example confirm the validity and illustrate the advantages of the proposed method.

## Yuping Zhang; Department of Statistics, University of Connecticut

### A statistical framework for data integration through graphical models with application to cancer genomics

#### October 11, 2017

Recent advances in high-throughput biotechnologies have generated unprecedented types and amounts of data for biomedical research. Multiple types of genomic data are increasingly available within and across studies. In this talk, we will focus on the problem of discovering regulatory relationships among heterogeneous genomic variables from biological conditions with potentially shared regulation mechanisms. We will address statistical issues in data integration and present a new statistical learning method for integrating diverse genomics data. The performance of our method will be demonstrated through simulations and applications to real cancer data. This is joint work with Zhengqing Ouyang (The Jackson Laboratory for Genomic Medicine) and Hongyu Zhao (Yale University).

## Zongming Ma; Department of Statistics, University of Pennsylvania

### Optimal hypothesis testing for stochastic block models with growing degrees

#### October 18, 2017

In this talk, we discuss optimal hypothesis testing for distinguishing a stochastic block model from an Erdos-Renyi random graph. We derive central limit theorems for a collection of linear spectral statistics under both the null and local alternatives. In addition, we show that linear spectral statistics based on Chebyshev polynomials can be used to approximate signed cycles of growing lengths which in turn determine the likelihood ratio test asymptotically when the graph size and the average degree grow to infinity together. Therefore, one achieves sharp asymptotic optimal power of the testing problem within polynomial time complexity.

## Néhémy Lim; Department of Statistics, University of Connecticut

### Balancing Statistical and Computational Precision for Efficient Variable Selection

#### October 25, 2017

Driven by the advances in technology, large and high-dimensional data have become the rule rather than the exception. Approaches that allow for variable selection with such data are thus highly sought after, in particular, since standard methods, like cross-validated Lasso, can be computationally intractable and, in any case, lack theoretical guarantees. In this paper, we propose a novel approach to variable selection in regression. Consisting of simple optimization steps and tests, it is computationally more efficient than existing methods and, therefore, suited even for very large data sets. Moreover, in contrast to standard methods, it is equipped with sharp statistical and computational guarantees. We thus expect that our algorithm can help to leverage the increasing volume of data in Biology, Public Health, Astronomy, Economics, and other fields.

## Bin Zou; Department of Mathematics, University of Connecticut

### Optimal investment with transaction costs under cumulative prospect theory in discrete time

#### November 1, 2017

We study optimal investment problems under the framework of cumulative prospect theory (CPT). A CPT investor makes investment decisions in a single-period financial market with transaction costs. The objective is to seek the optimal investment strategy that maximizes the prospect value of the investor’s final wealth. We obtain the optimal investment strategy explicitly in two examples. An economic analysis is conducted to investigate the impact of the transaction costs and risk aversion on the optimal investment strategy.

## Daniel Sussman; Mathematics and Statistics Department, Boston University

### Multiple Network Inference: From Joint Embeddings to Graph Matching

#### November 8, 2017

Statistical theory, computational methods, and empirical evidence abound for the study of individual networks. However, extending these ideas to the multiple-network framework remains a relatively under-explored area. Individuals today interact with each other through numerous modalities including online social networks, telecommunications, face-to- face interactions, financial transactions, and the sharing and distribution of goods and services. Individually these networks may hide important activities that are only revealed when the networks are studied jointly. In this talk, we’ll explore statistical and computational methods to study multiple networks, including a tool to borrow strength across networks via joint embeddings and a tool to confront the challenges of entity resolution across networks via graph matching.

## Zhigen Zhao; Department of Statistical Science, Temple University

### Nonparametric Empirical Bayes Estimator For Simultaneous Variances

#### November 15, 2017

The shrinkage estimation has proven to be very useful when facing with a large number of mean parameters to be estimated. In the modern application, we also face with the situation of estimating a large number of variances simultaneously. There are a few attempts to introduce the shrinkage variance estimators using parametric empirical Bayes approach.

In this paper, we construct a non-parametric estimation of simultaneous variances (NESV). Namely, we take the f-modeling approach and assume an arbitrary prior on the variances. Under an invariant loss function, the resultant Bayes decision estimator relies on the marginal cumulative distribution function only, which can be reliably estimated using the empirical distribution function.

We applied the proposed NESV to construct the confidence intervals for the (selected) mean parameters. It is shown that the intervals based on the NESV are shortest among all the intervals which guarantee a desired coverage probability. Through two real data analysis, we have further shown that the NESV based intervals lead to the smallest number of discordant parameters, a favorable property when facing with the current “replication crisis”.

## Amy Willis; Department of Biostatistics, University of Washington

### Confidence sets for phylogenetic trees

#### November 29, 2017

Phylogenetic trees represent evolutionary histories and have many important applications in biology, anthropology and criminology. The branching structure of the tree encodes the order of evolutionary divergence, and the branch lengths denote the time between divergence events. The target of interest in phylogenetic tree inference is high-

dimensional, but the real challenge is that both the discrete (tree topology) and continuous (branch lengths) components need to be estimated. While decomposing inference on the topology and branch lengths has been historically popular, the mathematical and algorithmic developments of the last 15 years have provided a new framework for holistically treating uncertainty in tree inference. I will discuss how we can leverage these developments to construct a confidence set for the Fréchet mean of a distribution with support on the space of phylogenetic trees. The sets have good coverage and are efficient to compute. I will conclude by applying the procedure to revisit an HIV forensics investigation, and to assess our confidence in the geographical origins of the Zika virus.

## Christopher Glynn; Peter T. Paul School of Business and Economics, University of New Hampshire

### Dynamics of homelessness in urban America

#### December 6, 2017

The relationship between housing costs and homelessness has important implications for the way that city and county governments respond to increasing homeless populations. Though many analyses in the public policy literature have examined inter-community variation in homelessness rates to identify causal mechanisms of homelessness, few studies have examined time-varying homeless counts within the same community. To examine trends in homeless population counts in the 25 largest U.S. metropolitan areas, we develop a dynamic Bayesian hierarchical model for time-varying homeless count data. Particular care is given to modeling uncertainty in the homeless count generating and measurement processes, and a critical distinction is made between the counted number of homeless and the true size of the homeless population. For each metro under study, we investigate the relationship between increases in the Zillow Rent Index and increases in the homeless population. Sensitivity of inference to potential improvements in the accuracy of point-in- time counts is explored, and evidence is presented that the inferred increase in the rate of homelessness from 2011-2016 depends on prior beliefs about the accuracy of homeless counts. A main finding of the study is that the relationship between homelessness and rental costs is strongest in New York, Los Angeles, Washington, D.C., and Seattle.