Colloquia

Spring 2019

All colloquia will be held at 4pm in AUST 108, unless otherwise noted. Coffee will be served at 3:30pm in room 326.

Information about past colloquia is available here.

Date
Speaker
Title
Location
Wednesday, January 23 Youngdeok Hwang, Sungkyunkwan University Statistical Estimation of Air Pollution Through Integration of Physical Knowledge 4PM in BPB 130

Coffee at 3:30 in AUST 326

Friday, January 25 Yao Zheng, Purdue University Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions 11AM in ROWE 122

Coffee at 10:30 in AUST 326

Monday, January 28 Wen Zhou, Colorado State University Estimation and Inference of Heteroskedasticity Models with Latent Semiparametric Factors for Multivariate Time Series 11AM in ROWE 122

Coffee at 10:30 in AUST 326

Monday, February 4 Nicholas Henderson, Johns Hopkins University Estimating heterogeneous treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models 11AM in ROWE 122
Wednesday, February 13 Jun Yan, University of Connecticut Generalized scale-change models for recurrent event processes under informative censoring 4PM in AUST 108
Wednesday, February 20 Joseph Cappelleri, Pfizer Advancing Interpretation of Patient-Reported Outcomes 4PM in AUST 108
Wednesday, February 27 Stephanie Hicks, Johns Hopkins University Making data science accessible world-wide in the Johns Hopkins Data Science Lab 4PM in AUST 108
Wednesday, March 6 Erin Conlon, University of Massachusetts Parallel Markov chain Monte Carlo for Bayesian hierarchical models with big data, in two stages 4PM in AUST 108
Wednesday, March 13 Victoria Pena, CUNY 4PM in AUST 108
Wednesday, March 27 Linglong Kong, University of Alberta 4PM in AUST 108
Wednesday, April 3 Donald Berry, MD Anderson 4PM in CHEM A203
Wednesday, April 10 Patrick Flaherty, University of Massachusetts 4PM in AUST 108
Friday, April 12 Debanjan Chattacharjee, Utah Valley University 11AM, location TBD
Wednesday, April 17 Bhramar Mukherjee, University of Michigan 4PM in AUST 108
Wednesday, April 24 Dipankar Bandyopadhyay, Virginia Commonwealth University 4PM in AUST 108
Wednesday, May 1 Karthik Bharath, Nottingham University 4PM in AUST 108

Colloquium is organized by Professor Yuwen Gu.


Youngdeok Hwang; Sungkyunkwan University

Statistical Estimation of Air Pollution Through Integration of Physical Knowledge

January 23, 2019

Air pollution is driven by non-local dynamics, in which the air quality at a site is determined by the transport of pollutants from distant pollution emission sources by atmospheric processes. To understand the underlying nature of pollution generation, it is crucial to employ physical knowledge to account for the transport of pollutants by the wind. In this talk, I will discuss methods for estimating the pollution emission from the area of interest through the use of physical knowledge and observed data. The proposed methods use a statistical approach to utilize large-scale data from a numerical weather prediction model, while integrating the dynamics of the physical processes into the model. Also discussed are some extensions and related problems.


Yao Zheng; Purdue University

Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions

January 25, 2019

We develop a unified finite-time theory for the OLS estimation of possibly unstable and even slightly explosive VAR models under linear restrictions, with the applicable region \rho(A)\leq 1+c/T, where \rho(A) is the spectral radius of the transition matrix A in the VAR(1) representation, T is the time horizon and c>0 is a universal constant. This linear restriction framework encompasses various existing models in the literature such as banded/network VAR models. We show that the restrictions reduce the error bounds through not only the reduced dimensionality but also a scale factor that resembles the asymptotic covariance matrix of the estimator in the fixed dimensional setup; as long as the model is correctly specified, this scale factor is decreasing in the number of restrictions. Our analysis reveals that the phase transition from slow and fast error rate regimes is determined by the smallest singular value of A, a measure of the least excitable mode of the system. The minimax lower bounds are also derived across different regimes. The developed finite-time theory not only bridges the theoretical gap between stable and unstable regimes but also precisely characterizes the effect of the restrictions and its interplay with other model parameters. Simulations support our theoretical results in both small and large samples


Wen Zhou; Colorado State University

Estimation and Inference of Heteroskedasticity Models with Latent Semiparametric Factors for Multivariate Time Series

January 28, 2019

This paper considers estimation and inference of a flexible heteroskedasticity model for multivariate time series, which employs semiparametric latent factors to simultaneously account for the heteroskedasticity and contemporaneous correlations. Specifically, the heteroskedasticity is modeled by the product of unobserved stationary processes of factors and subject-specific covariate effects. Serving as the loadings, the covariate effects are further modeled through additive models. We propose a two-step procedure for estimation. First, the latent processes of factors and their nonparametric loadings are estimated via projection-based methods. The estimation of regression coefficients is further conducted through generalized least squares. Theoretical validity of the two-step procedure is documented. By carefully examining the convergence rates for estimating the latent processes of factors and their loadings, we further study the asymptotic properties of the estimated regression coefficients. In particular, we establish the asymptotic normality of the proposed two-step estimates of regression coefficients. The proposed regression coefficient estimator is also shown to be asymptotically efficient. This leads us to a more efficient confidence set of the regression coefficients. Using a comprehensive simulation study, we demonstrate the finite sample performance of the proposed procedure, and numerical results corroborate our theoretical findings. Finally, we illustrate the use of our proposal through application to the air quality data.


Nicholas Henderson; Johns Hopkins University

Estimating heterogeneous treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models

February 4, 2019

Individuals often respond differently to identical treatments, and characterizing such variability in treatment response is an important aim in the practice of personalized medicine. In this article, we describe a nonparametric accelerated failure time model that can be used to analyze heterogeneous treatment effects (HTE) when patient outcomes are time-to-event. By utilizing Bayesian additive regression trees and a mean-constrained Dirichlet process mixture model, our approach offers a flexible model for the regression function while placing few restrictions on the baseline hazard. Our nonparametric method leads to natural estimates of individual treatment effect and has the flexibility to address many major goals of HTE assessment. Moreover, our method requires little user input in terms of model specification for treatment covariate interactions or for tuning parameter selection. Our procedure shows strong predictive performance while also exhibiting good frequentist properties in terms of parameter coverage and mitigation of spurious findings of HTE. We illustrate the merits of our proposed approach with a detailed analysis of two large clinical trials for the prevention and treatment of congestive heart failure using an angiotensin-converting enzyme inhibitor. The analysis revealed considerable evidence for the presence of HTE in both trials as demonstrated by substantial estimated variation in treatment effect and by high proportions of patients exhibiting strong evidence of having treatment effects which differ from the overall treatment effect.


Jun Yan; University of Connecticut

Generalized scale-change models for recurrent event processes under informative censoring

February 13, 2019

Two major challenges arise in regression analyses of recurrent event data: first, popular existing models, such as the Cox-type models, may not fully capture the covariate effects on the underlying recurrent event process; second, the censoring time remains informative about the risk of experiencing recurrent events after accounting for covariates. We tackle both challenges by a general class of semiparametric scale-change models that allow a scale-change covariate effect as well as a multiplicative covariate effect. The proposed model is flexible and nests several existing models, including the popular proportional rates model, the accelerated mean model, and the accelerated rate model. Moreover, it accommodates informative censoring through subject-level latent frailty whose distribution is left unspecified. A robust approach is proposed to estimate the model parameters, which does not need a parametric assumption on the distribution of the frailty and the recurrent event process. The asymptotic properties of the resulting estimator are established, with the asymptotic variance estimated from a novel resampling approach. As a byproduct, the structure of the model provides a model selection approach among the submodels via hypothesis testing of model parameters. Numerical studies show that the proposed estimator and the model selection procedure perform well under both noninformative and informative censoring scenarios. The methods are applied to data from two transplant cohorts to study the risk of infections after transplantation.


Joseph Cappelleri; Pfizer

Advancing Interpretation of Patient-Reported Outcomes

February 20, 2019

A patient-reported outcome is any report on the status of a patient’s health condition that comes directly from the patient. Clear and meaningful interpretation of patient-reported outcome scores are fundamental to their use as they can be valuable in designing studies, evaluating interventions, educating consumers, and informing health policy makers involved with regulatory, reimbursement, and advisory agencies. Interpretation of patient-reported outcome scores, however, is often not well understood because of insufficient data or lack of experience or clinical understanding to draw from.

This presentation provides an update review on two broad approaches – anchor-based and distributed-based – aimed at enriching the understanding and meaning of patient-reported outcome scores. Anchor-based approaches use a measure (external to the targeted patient-reported outcome of interest) that is well interpretable and correlated with the targeted patient-reported outcome. Examples include percentages based on thresholds, criterion-group interpretation, content-based interpretation, and clinical important difference. Distributed-based approaches rely strictly on the distribution of the data. Examples include effect size, probability of relative benefit, and cumulative distribution functions. Applications are based on real-life and simulated examples.


Stephanie Hicks; Johns Hopkins University

Making data science accessible world-wide in the Johns Hopkins Data Science Lab

February 20, 2019

In this talk, I will introduce the Johns Hopkins Data Science Lab: who we are, what are our goals, and the types of projects we are working on to make data science accessible world-wide. Then, I will discuss projects that I have focused on related to data science education. Despite unprecedented and growing interest in data science on campuses, there are few courses and course materials that provide meaningful opportunity for students to learn about real-world challenges. Most courses provide unrealistically clean data sets that fit the assumptions of the methods in an unrealistic way. The result is that students are left unable to effectively analyze data and solve real-world challenges outside of the classroom. To address this problem, I am leveraging the idea from Nolan and Speed in 1999, who argued the solution to this problem is to teach courses through in-depth case studies derived from interesting scientific questions with nontrivial solutions that leave room for different analyses of the data. I will share a set of general principles and offer a detailed guide derived from my successful experience developing and teaching graduate-level, introductory data science courses centered entirely on case studies. Furthermore, I will present the Open Case Studies educational resource of case studies that educators can use in the classroom to teach students how to effectively derive knowledge from data derived from real-world challenges.


Erin Conlon; University of Massachusetts

Parallel Markov chain Monte Carlo for Bayesian hierarchical models with big data, in two stages

March 6, 2019

Due to the recent growth of big data sets, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been created. These methods divide large data sets by observations into subsets. However, many Bayesian hierarchical models have only a small number of parameters that are common to the full data set, with the majority of parameters being group specific. Therefore, techniques that split the full data set by groups rather than by observations are a more natural analysis approach.

Here, we adapt and extend such a two-stage Bayesian hierarchical modelling method. In stage 1, each group is evaluated independently in parallel; the stage 1 posteriors are used as proposal distributions in stage 2, where the full model is estimated. We illustrate our approach using both simulation and real data sets, with both three-level and four-level models. Our results show considerable increases in MCMC efficiency and large reductions in computation times compared to the full data analysis.