University of Connecticut University of UC Title Fallback Connecticut


Fall 2017 

All colloquia will be held at 4pm in AUST 105, unless otherwise noted. Coffee will be served at 3:30pm in room 326.

Information about past colloquia is available here.

Wednesday, September 6 Joseph Glaz, University of Connecticut Multiple Window Scan Statistics for Detecting a Local Change in Variance for Normal Data
Wednesday, September 13 Zhiyi Chi, University of Connecticut Exact sampling for infinitely divisible distributions and Levy processes
Wednesday, September 20 Brian Hobbs, The University of Texas MD Anderson Cancer Center  POSTPONED
Wednesday, September 20 Jungbin Hwang, University of Connecticut Should We Go One Step Further? An Accurate Comparison of One-step and Two-step Procedures in a Generalized Method of Moments Framework
Wednesday, September 27 Shuangge Ma, Yale University Integrating multidimensial omics data for cancer prognosis
Wednesday, October 4 Haoda Fu, Eli Lilly Individualized Treatment Recommendation (ITR) for Survival Outcomes
Friday, October 6 Yang Li, Renmin University of China Model Confidence Bounds for Variable Selection 11 am, coffee served at 10:30 am
Wednesday, October 11 Yuping Zhang, University of Connecticut TBD
Wednesday, October 18 Zongming Ma, University of Pennsylvania TBD
Wednesday, October 25 TBD TBD
Wednesday, November 1 TBD TBD
Wednesday, November 8 Daniel Lewis Sussman, Boston University TBD
Wednesday, November 15
Wednesday, November 22 None Thanksgiving Week
Wednesday, November 29 Amy Willis, University of Washington Confidence sets for phylogenetic trees
Wednesday, December 6 Kelly Zou, Pfizer Real-World Evidence in the Era of Big Data

Colloquium is organized by Professor Xiaojing Wang.

Joseph Glaz; University of Connecticut

Multiple Window Scan Statistics for Detecting a Local Change in Variance for Normal Data

September 6, 2017

In this talk research in the area of scan statistics will be reviewed. Recent results for detecting a local change in variance will be discussed for one dimensional normal data. Numerical results to evaluate the performance of these scan statistics will be presented. The two dimensional case will also be briefly discussed. Future research in this active area and application will be also mentioned.

Zhiyi Chi; University of Connecticut

Exact sampling for infinitely divisible distributions and Levy processes

September 13, 2017

Infinitely divisible (i.d.) distributions have many applications. Unfortunately, many of them are specified via sum of infinitely many jumps and have no closed-form expressions, making them difficult to sample exactly. I will show that for a rather wide range of i.d. distributions with finite variation, this difficulty can be overcome by utilizing an integral series expansion of their probability densities and rejection sampling.

If time permits, I will also briefly discusses exact sampling of first passage event of Levy processes. The idea is to embed a process into a “carrier” process whose first passage event can be sampled exactly and then extract the part belonging to the former from the data sampled for the carrier. This part will be mostly explained by pictures instead of technical formulas.

Jungbin Hwang; Department of Economics, University of Connecticut

Should We Go One Step Further?
An Accurate Comparison of One-step and Two-step Procedures in a Generalized Method of Moments Framework

September 20, 2017

According to the conventional asymptotic theory, the two-step Generalized Method of Moments (GMM) estimator and test perform as least as well as the one-step estimator and test in large samples. The conventional asymptotic theory, as elegant and convenient as it is, completely ignores the estimation uncertainty in the weighting matrix, and as a result it may not reflect finite sample situations well. In this paper, we employ the fixed-smoothing asymptotic theory that accounts for the estimation uncertainty, and compare the performance of the one-step and two-step procedures in this more accurate asymptotic framework. We show the two-step procedure outperforms the one-step procedure only when the benefit of using the optimal weighting matrix outweighs the cost of estimating it. This qualitative message applies to both the asymptotic variance comparison and power comparison of the associated tests. A Monte Carlo study lends support to our asymptotic results.

Shuangge Ma; Department of Biostatistics, Yale University

Integrating multidimensional omics data for cancer prognosis

September 27, 2017

Prognosis is of essential interest in cancer research. Multiple types of omics measurements – including mRNA gene expression, methylation, copy number variation, SNP, and others – have been implicated in cancer prognosis. The analysis of multidimensional omics data is challenging because of the high data dimensionality and, more importantly, because of the interconnections between different units of the same type of measurement and between
different types of omics measurements. In our study, we have developed novel regularization based methods, effectively integrated multidimensional data, and constructed prognosis models. It is shown that integrating multidimensional data can lead to biological discoveries missed by the analysis of one dimensional data and superior prognosis models.

Haoda Fu; Biometrics and Advanced Analytics, Eli Lilly and Company

Individualized Treatment Recommendation (ITR) for Survival Outcomes

October 4, 2017

ITR is a method to recommend treatment based on individual patient characteristics to maximize clinical benefit. During the past a few years, we have developed and published methods on this topic with various applications including comprehensive search algorithms, tree methods, benefit risk algorithm, multiple treatment & multiple ordinal treatment algorithms. In this talk, we propose a new ITR method to handle survival outcomes for multiple treatments. This new model enjoys the following practical and theoretical features:

  • Instead of fitting the data, our method directly search the optimal treatment police which improve the efficiency
  • To adjust censoring, we propose a doubly robust estimator. Our method only requires either censoring model or survival model is correct, but not both. When both are correct, our method enjoys better efficiency
  • Our method handles multiple treatments with intuitive geometry explanations
  • Our method is Fisher’s consistent even under either censoring model or survival model misspecification (but not both).

Yang Li; School of Statistics, Renmin University of China

Model Confidence Bounds for Variable Selection

October 6, 2017

In this article, we introduce the concept of model confidence bounds (MCBs) for variable selection in the context of nested models. Similarly to the endpoints in the familiar confidence interval for parameter estimation, the MCBs identify two nested models (upper and lower confidence bound models) containing the true model at a given level of confidence. Instead of trusting a single selected model obtained from a given model selection method, the MCBs proposes a group of nested models as candidates and the MCBs’ width and composition enable the practitioner to assess the overall model selection uncertainty. A new graphical tool — the model uncertainty curve (MUC) — is introduced to visualize the variability of model selection and to compare different model selection procedures. The MCBs methodology is implemented by a fast bootstrap algorithm that is shown to yield the correct asymptotic coverage under rather general conditions. Our Monte Carlo simulations and a real data example confirm the validity and illustrate the advantages of the proposed method.

Kelly Zou; Pfizer Inc.

Real-World Evidence in the Era of Big Data

November 15, 2017

Given the desire to enhance the effectiveness and efficiency of health care systems, it is important to understand and evaluate the risk factors for disease progression, treatment patterns such as medication uses, and utilizations such as hospitalization. Statistical analyses via observational studies and data mining may help evaluate patients’ diagnostic and prognostic outcomes, as well as inform policies to improve patient outcomes and to control costs. In the era of big data, real-world longitudinal patient-level databases containing the insurance claims of commercially insured adults, electronic health records, or cross-sectional surveys, provide useful insights to such analyses. Within the healthcare industry, executing rapid queries to inform development and commercialization strategies, as well as pre-specified non-interventional observation studies, are commonly performed. In addition, pragmatic studies are increasingly being conducted to examine health-related outcomes. In this presentation, selective published examples on real-world data analyses are illustrated. Results typically suggest that paying attention to patient comorbidities and pre-index or at index health care service utilization may help identify patients at higher risk and unmet needs for treatments. Finally, fruitful collaborative opportunities exist across different sectors among academia, industry and the government.

Amy Willis; Department of Biostatistics, University of Washington

Confidence sets for phylogenetic trees

November 29, 2017

Phylogenetic trees represent evolutionary histories and have many important applications in biology, anthropology and criminology. The branching structure of the tree encodes the order of evolutionary divergence, and the branch lengths denote the time between divergence events. The target of interest in phylogenetic tree inference is high-
dimensional, but the real challenge is that both the discrete (tree topology) and continuous (branch lengths) components need to be estimated. While decomposing inference on the topology and branch lengths has been historically popular, the mathematical and algorithmic developments of the last 15 years have provided a new framework for holistically treating uncertainty in tree inference. I will discuss how we can leverage these developments to construct a confidence set for the Fréchet mean of a distribution with support on the space of
phylogenetic trees. The sets have good coverage and are efficient to compute. I will conclude by applying the procedure to revisit an HIV forensics investigation, and to assess our confidence in the geographical origins of the Zika virus.