All colloquia will be held at 4pm in AUST 108, unless otherwise noted. Coffee will be served at 3:30pm in room 326.
Information about past colloquia is available here.
|Wednesday, January 17||None||N/A||N/A|
|Wednesday, January 24||Jessica Cisewski, Yale University||A preferential attachment model for the stellar initial mass function via approximate Bayesian computation||AUST 108|
|Wednesday, January 31||Michael Jordan, UC Berkeley
NESS Colloquium; sponsored jointly by NESS and the UCONN departments of Statistics and CSE
|On Computational Thinking, Inferential Thinking and Data Science||MONT 104|
|Wednesday, February 7||Lucas Janson, Harvard University||Using Knockoffs to find important variables with statistical guarantees||AUST 108|
|Wednesday, February 14||Fei Miao, University of Connecticut||Data-Driven Dynamic Robust Resource Allocation for Efficient Transportation||AUST 108|
|Wednesday, February 21||Fei Wang, Cornell University||Is Your Data Cheating You? Towards Explainable AI in Medicine with Knowledge Empowerment||AUST 108|
|Wednesday, February 28||Kelly Zou, Pfizer||Real-World Evidence in the Era of Big Data||AUST 108|
|Wednesday, March 7||Nalini Ravishanker, University of Connecticut||TBD||AUST 108|
|Wednesday, March 21||Mengyang Gu, Johns Hopkins University||TBD||AUST 108|
|Wednesday, March 28||TBD||TBD||TBD|
|Wednesday, April 4||Paul Albert, NIH/NCI||TBD||AUST 108|
|Wednesday, April 11||Reneé Moore, Emory University||TBD||AUST 108|
|Wednesday, April 18||Brian Hobbs, Cleveland Clinic||TBD||AUST 108|
|Wednesday, April 25||TBD||TBD||TBD|
Colloquium is organized by Professor Xiaojing Wang.
Jessi Cisewski-Kehe; Yale University
A preferential attachment model for the stellar initial mass function via approximate Bayesian computation
January 24, 2018
Explicitly specifying a likelihood function is becoming increasingly difficult for many problems in astronomy. Astronomers often specify a simpler approximate likelihood – leaving out important aspects of a more realistic model. Estimation of a stellar initial mass function (IMF) is one such example. The stellar IMF is the mass distribution of stars initially formed in a particular volume of space, but is typically not directly observable due to stellar evolution and other disruptions of a cluster. Several difficulties associated with specifying a realistic likelihood function for the stellar IMF will be addressed in this talk.
Approximate Bayesian computation (ABC) provides a framework for performing inference in cases where the likelihood is not available. I will introduce ABC, and demonstrate its merit through a simplified IMF model where a likelihood function is specified and exact posteriors are available. To aid in capturing the dependence structure of the data, a new formation model for stellar clusters using a preferential attachment framework will be presented. The proposed formation model, along with ABC, provides a new mode of analysis of the IMF.
Michael I. Jordan; UC Berkeley
On Computational Thinking, Inferential Thinking and Data Science
January 31, 2018
The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in Data Science is apparent from their sharply divergent nature at an elementary level—in computer science, the growth of the number of data points is a source of “complexity” that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of “simplicity” in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as “runtime” in core statistical theory and the lack of a role for statistical concepts such as “risk” in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and including a surprising cameo role for symplectic geometry.
Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley.
His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics. Prof. Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the IJCAI Research Excellence Award in 2016, the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009. He is a Fellow of the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA and SIAM.
Lucas Janson; Harvard University
Using Knockoffs to find important variables with statistical guarantees
February 7, 2018
Many contemporary large-scale applications, from genomics to advertising, involve linking a response of interest to a large set of potential explanatory variables in a nonlinear fashion, such as when the response is binary. Although this modeling problem has been extensively studied, it remains unclear how to effectively select important variables while controlling the fraction of false discoveries, even in high-dimensional logistic regression, not to mention general high-dimensional nonlinear models. To address such a practical problem, we propose a new framework of model-X knockoffs, which reads from a different perspective the knockoff procedure (Barber and Candès, 2015) originally designed for controlling the false discovery rate in linear models. Model-X knockoffs can deal with arbitrary (and unknown) conditional models and any dimensions, including when the number of explanatory variables p exceeds the sample size n. Our approach requires the design matrix be random (independent and identically distributed rows) with a known distribution for the explanatory variables, although we show preliminary evidence that our procedure is robust to unknown/estimated distributions. As we require no knowledge/assumptions about the conditional distribution of the response, we effectively shift the burden of knowledge from the response to the explanatory variables, in contrast to the canonical model-based approach which assumes a parametric model for the response but very little about the explanatory variables. To our knowledge, no other procedure solves the controlled variable selection problem in such generality, but in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.
Fei Miao; University of Connecticut
Data-Driven Dynamic Robust Resource Allocation for Efficient Transportation
February 14, 2018
Ubiquitous sensing in smart cities enables large-scale multi-source data collected in real-time, poses several challenges and requires a paradigm-shift to capture the complexity and dynamics of systems. Data-driven cyber-physical systems (CPSs) integrating machine learning, optimization, and control are highly desirable for this paradigm-shift, since existing model-based techniques of CPSs become inadequate. For instance, how to identify, analyze the dynamical interplay between urban-scale phenomena (such as mobility demand and supply) from data, and take actions to improve system-level service efficiency is still a challenging problem in transportation systems. In this talk, we present a data-driven dynamic robust resource allocation framework to match supply towards spatial-temporally uncertain demand, while seeking to reduce total resource allocation cost. First, we present a receding horizon control framework that incorporates large-scale historical and real-time sensing data in demand prediction and dispatch decisions under practical constraints. However, demand prediction error is not negligible and affects the system’s performance. Therefore, with spatial-temporal demand uncertainty models constructed from data, we then develop two computationally tractable robust resource allocation methods to provide probabilistic guarantees for the system’s worst-case and expected performances. As a case study, we evaluated the proposed framework using real taxi operational data. Lastly, I will provide an overview of my research that uses the knowledge of the system dynamics to guarantee security and resiliency properties of CPSs. I will introduce my research of coding schemes for stealthy data injection attacks detection, and stochastic game schemes for resilient control strategies.
Fei Wang; Weill Cornell Medical College
Is Your Data Cheating You? Towards Explainable AI in Medicine with Knowledge Empowerment
February 21, 2018
With the arrival of big data era, more and more data in different real world applications are becoming readily available. Artificial Intelligence (AI), which aims at providing the computer with the capability of learning from data like humans, is becoming ubiquitous. Many sophisticated AI models, such as deep learning, has been very popular. However, the success of these methods usually required huge amount of data, while in medicine it is usually costly or even impossible to get such large amount of data. Therefore only limited data samples are available, in which case the existing AI methodologies could easily overfit and thus misled by the data. In this talk, I will present some of the research from my lab on how to armor AI algorithms with domain knowledge to make them more effectively discover the genuine insights from data. Specifically, I will talk about how to enhance data-driven algorithms with knowledges as well as the concrete examples on knowledge acquisition and integration. I will also present examples on how they can be used in real world medical problems.
Kelly Zou; Pfizer Inc
Real-World Evidence in the Era of Big Data
February 28, 2018
Given the desire to enhance the effectiveness and efficiency of health care systems, it is important to understand and evaluate the risk factors for disease progression, treatment patterns such as medication uses, and utilizations such as hospitalization. Statistical analyses via observational studies and data mining may help evaluate patients’ diagnostic and prognostic outcomes, as well as inform policies to improve patient outcomes and to control costs. In the era of big data, real-world longitudinal patient-level databases containing the insurance claims of commercially insured adults, electronic health records, or cross-sectional surveys, provide useful insights to such analyses. Within the healthcare industry, executing rapid queries to inform development and commercialization strategies, as well as pre-specified non-interventional observation studies, are commonly performed. In addition, pragmatic studies are increasingly being conducted to examine health-related outcomes. In this presentation, selective published examples on real-world data analyses are illustrated. Results typically suggest that paying attention to patient comorbidities and pre-index or at index health care service utilization may help identify patients at higher risk and unmet needs for treatments. Finally, fruitful collaborative opportunities exist across different sectors among academia, industry and the government.