Information about past colloquia is available here.
AN EIGENMODEL FOR DYNAMIC MULTILAYER NETWORKS
Network (or graph) data is at the heart of many modern data science problems: disease transmission, community dynamics on social media, international relations, and others. In this talk, I will elaborate on my research in statistical inference for complex time-varying networks. I will focus on dynamic multilayer networks, which frequently represent the structure of multiple co-evolving relations. Despite their prevalence, statistical models are not well-developed for this network type. Here, I propose a new latent space model for dynamic multilayer networks. The key feature of this model is its ability to identify common time-varying structures shared by all layers while also accounting for layer-wise variation and degree heterogeneity. I establish the identifiability of the model’s parameters and develop a structured mean-field variational inference approach to estimate the model’s posterior, which scales to networks previously intractable to dynamic latent space models. I apply the model to two real-world problems: discerning regional conflicts in a data set of international relations and quantifying infectious disease spread throughout a school based on the student’s daily contact patterns.
Bio: Joshua D. Loyal is a PhD candidate in the Department of Statistics at the University of Illinois at Urbana-Champaign advised by Professors Yuguo Chen and Ruoqing Zhu. He received an M.S. in Physics from Yale University and a B.S. in Physics and Mathematics from Duke University. From 2015 to 2018, he was a Data Scientist at DataRobot, a start-up in Boston aimed at building an automated machine learning platform. His research interests include statistical network analysis, Bayesian inference, machine learning, data science, and statistical computing.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m99d9131d0704a05dd909a4fc4205787e|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2624 168 7099|
|Meeting number and password:||Meeting number (access code): 2624 168 7099
Meeting password: 4T6DiXDH3vs
|Date and time:||Wednesday, January 5, 2022 4:00 pm, EST
SEQUENTIAL DECISION MAKING: NONCONVEXITY AND NONSTATIONARITY
Numerous statistical problems including dynamic matrix sensing and completion, and online reinforcement learning can be formulated as nonconvex optimization problem where the objective function changes over time. In this work, we propose and analyze stochastic zeroth-order optimization algorithms in online learning setting for nonconvex functions in a nonstationary environment. We propose nonstationary versions of regret measures based on first-order and second-order optimal solutions and establish sub-linear regret bounds on these proposed regret measures. The main takeaway from this work is that one can track statistically favorable solution, i.e., stationary point or local minima of the underlying nonconvex objective function of a statistical learning problem even in a nonstationary environment. For the case of first-order optimal solution-based regret measures, we provide regret bounds for stochastic gradient descent algorithm. For the case of second-order optimal solution-based regret, we analyze stochastic cubic-regularized Newton’s Method. We establish the regret bounds in the zeroth-order oracle setting where one has access to noisy evaluations of the objective function only. We illustrate our results through simulation as well as several learning problems.
Bio: Dr. Abhishek Roy is currently a postdoctoral researcher in the Department of Statistics at the University of California, Davis. He works primarily with Prof. Krishnakumar Balasubramanian. He finished his Ph.D. in Electrical and Computer Engineering from the University of California, Davis in June 2020, advised by Prof. Prasant Mohapatra. Prior to this, he received a Bachelor of Technology (Hons.) in Electronics and Electrical Communication Engineering in 2013 from the Indian Institute of Technology, Kharagpur. His research interests include non-convex optimization, uncertainty quantification, Markov Chain Monte Carlo (MCMC) sampling, generalization properties of deep networks and robust learning from dependent data.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m1f81783e31b989095882bfe4cf1903e4|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2624 725 9067|
|Meeting number and password:||Meeting number (access code): 2624 725 9067
Meeting password: HXqVJxtH533
|Date and time:||Monday, January 10, 2022 4:00 pm, EST
SURVIVAL ANALYSIS VIA ORDINARY DIFFERENTIAL EQUATIONS
Survival analysis is an extensively studied branch of statistics with wide applications in various fields. Despite rich literature on survival analysis, the growing scale and complexity of modern data create new challenges that existing statistical models and estimation methods cannot meet. In the first part of this talk, I will introduce a novel and unified ordinary differential equation (ODE) framework for survival analysis. I will show that this ODE framework allows flexible modeling and enables a computationally and statistically efficient procedure for estimation and inference. In particular, the proposed estimation procedure is scalable, easy-to-implement, and applicable to a wide range of survival models. In the second part, I will present how the proposed ODE framework can be used to address the intrinsic optimization challenge in deep learning survival analysis, so as to accommodate data in diverse formats.
Bio: Weijing Tang is a PhD candidate in the Department of Statistics at the University of Michigan, advised by Prof. Ji Zhu. Her research interests include statistical machine learning, survival analysis, and statistical network analysis. She has received the ASA Nonparametric Statistics 2020 Student Paper Award, the ENAR 2021 Distinguished Student Paper Award, and the ASA Statistical Learning and Data Science Section 2021 Student Paper Award for her research work. Weijing is also enthusiastic about interdisciplinary research on applying statistical machine learning to help solve healthcare problems. Prior to the University of Michigan, Weijing received her BSc in Mathematics at Tsinghua University in 2016.
Personal Website: https://sites.google.com/umich.edu/weijingtang/
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m85c129bb9ff5263d6702227b1bb72c41|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2623 533 7778|
|Meeting number and password:||Meeting number (access code): 2623 533 7778
Meeting password: M6WkPFATU76
|Date and time:||Tuesday, January 11, 2022 4:00 pm, EST
COMPLEX STRUCTURE DISCOVERY AND RANDOMIZED FIELD EXPERIMENTS ON LARGE-SCALE SOCIAL AND POLITICAL NETWORKS
Social and political networks at many scales—from interpersonal networks of friends to international networks of countries—are a central theme of computational social science. Modern methods of data science that can contend with the complexity of networked data have the potential to break ground on long-standing questions of critical relevance to public policy. In this talk, I will present two lines of work on 1) estimating the causal effects of friend-to-friend mobilization in US elections, and 2) inferring complex latent structure in dyadic event data of country-to-country interactions. In the first part, I will discuss recent work using large-scale digital field experiments on the mobile app Outvote to estimate the causal effects of friend-to-friend texting on voter turnout in the 2018 and 2020 US elections. This work is among the first to rigorously assess the effectiveness of friend-to-friend “get out the vote” tactics, which political campaigns have increasingly embraced in recent elections. I will discuss the statistical challenges inherent to randomizing interactions between friends with a “light touch” design and will describe the methodology we developed to identify and precisely estimate causal effects despite these impediments. In the second part of this talk, I will discuss work on inferring complex latent structure in dyadic event data sets of international relations that contain millions of micro-records of the form “country i took action a to country j at time t”. The models we developed for this purpose blend elements of tensor decomposition and dynamical systems and are tailored to the challenging properties of high-dimensional discrete data. They reliably surface interpretable complex structure in dyadic event data while yielding tractable schemes for efficient posterior inference. At the end of the talk, I will briefly sketch a vision for the future of both lines of work.
Bio: Dr. Aaron Schein is a postdoctoral fellow in the Data Science Institute at Columbia. He received his PhD in Computer Science from UMass Amherst in 2019 and an MA in Linguistics and BA in Political Science also from UMass. His research develops statistical models and computational methods to analyze modern large-scale data in political science, sociology, and genetics, among other fields in the social and natural sciences.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=mc548e008cbfd60ff59da44c50395784c|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2623 472 9476|
|Meeting number and password:||Meeting number (access code): 2623 472 9476
Meeting password: dPW6XfmhX32
|Date and time:||Thursday, January 13, 2022 4:00 pm, EST
FAST APPROXIMATE BAYESBAG MODEL SELECTION VIA TAYLOR EXPANSIONS
In recent years, BayesBag has emerged as an effective remedy for the brittleness of Bayesian model selection under model misspecification. However, computing BayesBag can be prohibitively expensive for large datasets. In this talk, I propose a fast approximation of BayesBag model selection based on Taylor approximations of the log marginal likelihood, which can achieve results comparable to BayesBag in a fraction of the computation time. I provide concrete bounds on the approximation error and establish that it converges to zero asymptotically as the dataset grows. I demonstrate the utility of this approach using simulations, as well as model selection Problems arising in business, neuroscience, and forensics.
Bio: Dr. Spencer is currently a postdoctoral researcher in the Department of Biostatistics at Harvard School of Public Health. He works with Jeff Miller developing robust Bayesian methodology for biomedical applications, including biostatistical analysis of X-linked Dystonia Parkinsonism. He holds a joint PhD in Statistics and Machine Learning from Carnegie Mellon University, a MSc in Statistics from the University of British Columbia, and a BScH in Mathematics and Statistics from Acadia University. His methodological interests include robust Bayesian inference, statistical network modeling, and Monte Carlo methods. In terms of applications, he has worked in forensic science, medicine, neuroscience, and education.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m42ba1a802da77bc11b7b614d96442319|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2623 477 7100|
|Meeting number and password:||Meeting number (access code): 2623 477 7100
Meeting password: Kqdw7g3nc2q
|Date and time:||Tuesday, January 18, 2022 4:00 pm, EST
NEW DIRECTIONS IN BAYESIAN SHRINKAGE FOR SPARSE, STRUCTURED DATA
Sparse signal recovery remains an important challenge in large scale data analysis and global-local (G-L) shrinkage priors have undergone an explosive development in the last decade in both theory and methodology. These developments have established the G-L priors as the state-of-the-art Bayesian tool for sparse signal recovery as well as default non-linear problems. In the first half of my talk, I will survey the recent advances in this area, focusing on optimality and performance of G-L priors for both continuous as well as discrete data. In the second half, I will discuss several recent developments, including designing a shrinkage prior to handle bi-level sparsity in regression and handling sparse compositional data, routinely observed in microbiomics. I will discuss the methodological challenges associated with each of these problems, and propose to address this gap by using new prior distributions, specially designed to enable handling structured data. I will provide some theoretical support for the proposed methods and show improved performance in simulation settings and application to environmentrics and microbiome data.
Bio: Dr. Jyotishka Datta is an assistant professor of Statistics at Virginia Tech. Prior to this, he was an assistant professor in the Department of Mathematical Sciences at the University of Arkansas Fayetteville from 2016 to 2020. His research interest spans Bayesian methodology and theory for structured high-dimensional data. He has contributed to the area of multiple testing, shrinkage estimation, sparse signal recovery, nonparametric Bayes, bioinformatics, and default Bayes. Recent applications include next-gen sequencing studies, auditory neuroscience, ecology and crime forecasting.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=me31b8a0431bf9b696aeb789b65c011ae|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2624 142 0587|
|Meeting number and password:||Meeting number (access code): 2624 142 0587
Meeting password: arQdqzVJ895
|Date and time:||Wednesday, February 2, 2022 4:00 pm, EST
JIAN HUANG, PROFESSOR, DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE, DEPARTMENT OF BIOSTATISTICS, UNIVERSITY OF IOWA
A DEEP GENERATIVE APPROACH TO LEARNING A CONDITIONAL DISTRIBUTION
Conditional distribution is a fundamental quantity in statistics and machine learning that provides a full description of the relationship between a response and a predictor. There is a vast literature on conditional density estimation. A common feature of the existing methods is that they seek to estimate the functional form of the conditional density. We propose a deep generative approach to learning a conditional distribution by estimating a conditional generator, so that a random sample from the target conditional distribution can be obtained by transforming a sample from a reference distribution. The conditional generator is estimated nonparametrically using neural networks by matching appropriate joint distributions. There are several advantages of the proposed generative approach over the classical methods for conditional density estimation, including: (a) there is no restriction on the dimensionality of the response or predictor, (b) it can handle both continuous and discrete type predictors and responses, and (c) it is easy to obtain estimates of the summary measures of the underlying conditional distribution by Monte Carlo. We conduct numerical experiments to validate the proposed method and using several benchmark datasets, including the California housing, the MNIST, and the CelebA datasets, to illustrate its applications in conditional sample generation, uncertainty assessment of prediction, visualization of multivariate data, image generation and image reconstruction.
Bio: Dr. Jian Huang is Professor in the Department of Statistics and Actuarial Science and the Department of Biostatistics at the University of Iowa. His research interests include semiparametric models, statistical genetics, survival analysis, and analysis of high-dimensional data. Dr. Huang holds a PhD degree in Statistics from the University of Washington and is Fellow of the Institute of Mathematical Statistics and the American Statistical Association.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m67fdbc93e605d126918ed812a46cce78|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2624 850 1391|
|Meeting number and password:||Meeting number (access code): 2624 850 1391
Meeting password: K4DkitBwm32
|Date and time:||Wednesday, February 9, 2022 4:00 pm EST, 1-hour duration|
A GRAPHICAL MULTI-FIDELITY GAUSSIAN PROCESS MODEL, WITH APPLICATION TO EMULATION OF EXPENSIVE COMPUTER SIMULATIONS
With advances in scientific computing and mathematical modeling, complex phenomena can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hours to perform. One solution is multi-fidelity emulation, which uses data of varying accuracies (or fidelities) to train an efficient predictive model (or emulator) for the expensive simulator. In complex problems, simulation data with different fidelities are often connected scientifically via a directed acyclic graph (DAG), which is difficult to integrate within existing multi-fidelity emulator models. We thus propose a new Graphical Multi-fidelity Gaussian process (GMGP) model, which embeds this DAG (capturing scientific dependencies) within a Gaussian process framework. We show that the GMGP has desirable modeling traits via two Markov properties, and admits a scalable formulation for recursive computation of the posterior predictive distribution along sub-graphs. We also present an experimental design framework over the DAG given an experimental budget, and propose a nonlinear extension of the GMGP model via deep Gaussian processes. The advantages of the GMGP model are then demonstrated via a suite of numerical experiments and an application to emulation of heavy-ion collisions, which can be used to study the conditions of matter in the Universe shortly after the Big Bang.
Bio:Dr. Simon Mak is an Assistant Professor in the Department of Statistical Science at Duke University. Prior to Duke, he was a Postdoctoral Fellow at the Stewart School of Industrial & Systems Engineering at Georgia Tech. His research involves integrating domain knowledge (e.g., scientific theories, mechanistic models, financial principles) as prior information for statistical inference and prediction. This gives a holistic framework for interpretable statistical learning, providing a principled way for scientists to validate theories from data, and for statisticians to integrate scientific knowledge. His research tackles methodological, theoretical, and algorithmic challenges in this integration. This involves building probabilistic models on complex objects (e.g., functions, manifolds, networks), and developing efficient algorithms and data collection methods for model training.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m76b929101a728c8bd1b67291fd7fe551|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2621 407 1875|
|Meeting number and password:||Meeting number (access code): 2621 407 1875
Meeting password: Mj2hDTTb6k5
|Date and time:||Wednesday, February 16, 2022 4:00 pm EST, 1-hour duration|
COMPUTATIONALLY EFFICIENT BAYESIAN UNIT-LEVEL MODELING OF NON-GAUSSIAN SURVEY DATA UNDER INFORMATIVE SAMPLING
Statistical estimates from survey samples have traditionally been obtained via design-based estimators. In many cases, these estimators tend to work well for quantities such as population totals or means, but can fall short as sample sizes become small. In today’s “information age,” there is a strong demand for more granular estimates. To meet this demand, using a Bayesian pseudo-likelihood, we propose a computationally efficient unit-level modeling approach for non-Gaussian data collected under informative sampling designs. Specifically, we focus on binary and multinomial data. Our approach is both multivariate and multi-scale, incorporating spatial dependence at the area-level. We illustrate our approach through an empirical simulation study and through a motivating application to health insurance estimates using the American Community Survey.
Bio: Dr. Parker is currently an assistant professor in the Department of Statistics at the University of California, Santa Cruz. He obtained his Ph.D. in Statistics at the University of Missouri, where he was a recipient of the U.S. Census Bureau Dissertation Fellowship, and a recipient of the University of Missouri Population, Education and Health Center Interdisciplinary Doctoral Fellowship. His dissertation work was focused on Bayesian methods for modeling non-Gaussian unit-level survey data under informative sampling, with an emphasis on application to small area estimation. He is broadly interested in modeling dependent data (time-series, spatial, functional, etc.) for a variety of applications including official statistics, social sciences, and ecology. He is also interested in integration of modern machine learning and data science techniques to help improve statistical models.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m445c052227a1c14c8131cf00525c6bf4|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2620 915 8662|
|Meeting number and password:||Meeting number (access code): 2620 915 8662
Meeting password: Fq3zC9uTCS3
|Date and time:||Wednesday, March 2, 2022, 4:00 pm EST, 1-hour duration|
BHASWAR B. BHATTACHARYA, ASSISTANT PROFESSOR, WHARTON STATISTICS DEPARTMENT, UNIVERSITY OF PENNSYLVANIA
DISTRIBUTION-FREE NONPARAMETRIC INFERENCE BASED ON OPTIMAL TRANSPORT: EFFICIENCY LOWER BOUNDS AND RANK-KERNEL TESTS
The Wilcoxon rank-sum/Mann-Whitney test is one of the most popular distribution-free procedures for testing the equality of two univariate probability distributions. One of the main reasons for its popularity can be attributed to the remarkable result of Hodges and Lehmann (1956), which shows that the asymptotic relative efficiency of Wilcoxon’s test with respect to Student’s -test, under location alternatives, never falls below 0.864, despite the former being exactly distribution-free in finite samples. Even more striking is the result of Chernoff and Savage (1958), which shows that the efficiency of a Gaussian score transformed Wilcoxon’s test, against the -test, is lower bounded by 1. In this talk we will discuss multivariate versions of these celebrated results, by considering distribution-free analogues of the Hotelling -test based on optimal transport. The proposed tests are consistent against a general class of alternatives and satisfy Hodges-Lehmann and Chernoff-Savage-type efficiency lower bounds over various natural families of multivariate distributions, despite being entirely agnostic to the underlying data generating mechanism. Analogous results for independence testing will also be presented. Finally, we will discuss how optimal transport based multivariate ranks can be used to obtain distribution-free kernel two-sample tests, which are universally consistent, computationally efficient, and have non-trivial asymptotic efficiency. (Based on joint work with Nabarun Deb and Bodhisattva Sen.)
Bio: Dr. Bhattacharya is an Assistant Professor in the Department of Statistics at the Wharton School, University of Pennsylvania. He received his Ph.D. from the Department of Statistics at Stanford University in 2016, under the supervision of Persi Diaconis. Prior to that, he received his Bachelor and Master degrees in Statistics from the Indian Statistical Institute, Kolkata. His research interests include Nonparametric Statistics, Combinatorial Probability, and Discrete and Computational Geometry.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=me3cd1bd12c1532dd8ae4973b217cb0dc|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2623 916 6207|
|Meeting number and password:||Meeting number (access code): 2623 916 6207
Meeting password: QzMU5JVsB23
|Date and time:||Wednesday, March 9, 2022, 4:00 pm EST, 1-hour duration|
UMASS/UCONN JOINT STATISTICS COLLOQUIUM
NAITEE TING, FELLOW OF AMERICAN STATISTICAL ASSOCIATION, DIRECTOR IN THE DEPARTMENT OF BIOSTATISTICS AND DATA SCIENCES AT BOEHRINGER-INGELHEIM PHARMACEUTICALS
CASE STUDY – CLINICAL DEVELOPMENT OF AN ANTI-INFLAMMATORY DRUG
Pain management has long been a medical challenge. Chronic pain tends to be the pain that lasts more than three months. Depending on source of pain, it can be classified as neuropathic pain, inflammatory pain, cancer pain, and others. This presentation is a case study of developing a new drug for the management of inflammatory pain. The drug was discovered in the 1980’s. It was developed for three indications – acute pain (use of a dental pain model), osteoarthritis (OA), and rheumatoid arthritis (RA). This drug has clear efficacy in all three indications. Unfortunately, after over 10 years of clinical development, a late developed adverse event was considered as potentially toxic to patients. This drug was not marketed because its benefit does not outweigh the associated risks.
Bio: Naitee Ting is a Fellow of American Statistical Association (ASA). He is currently a Director in the Department of Biostatistics and Data Sciences at Boehringer-Ingelheim Pharmaceuticals Inc. (BI). He joined BI in September of 2009, and before joining BI, he was at Pfizer Inc. for 22 years (1987-2009). Naitee received his Ph.D. in 1987 from Colorado State University (major in Statistics). He has an M.S. degree from Mississippi State University (1979, Statistics) and a B.S. degree from College of Chinese Culture (1976, Forestry) at Taipei, Taiwan.
Naitee published articles in Technometrics, Statistics in Medicine, Drug Information Journal, Journal of Statistical Planning and Inference, Journal of Biopharmaceutical Statistics, Biometrical Journal, Statistics and Probability Letters, and Journal of Statistical Computation and Simulation. His book “Dose Finding in Drug Development” was published in 2006 by Springer, and is considered as the leading reference in the field of dose response clinical trials. The book “Fundamental Concepts for New Clinical Trialists”, co-authored with Scott Evans, was published by CRC in 2015. Another book “Phase II Clinical Development of New Drugs”, co-authored with Chen, Ho, and Cappelleri was published in 2017 (Springer). Naitee is an adjunct professor of Columbia University, University of Connecticut, and Colorado State University. Naitee has been an active member of both the ASA and the International Chinese Statistical Association (ICSA).
|Date and time:||Wednesday, March 23, 2022, 4:00 pm EDT, 1-hour duration|
DATA SCIENCE FOR IMAGE ANALYSIS
Working with collections of images presents unique challenges to the application of data science techniques. In this talk, I will start by presenting some of the theoretical concerns and proposed solutions for working with image data. Then, I will introduce software I have been developing — the Distant Viewing Toolkit — that enables the application of corpus-based techniques to images. Finally, I will show an application of this approach using a case study of a data analysis project studying a set of two American television shows.
Bio: Dr. Arnold studies massive cultural datasets in order to address new and existing research questions in the humanities and social sciences. He specializes in the application of statistical computing to large text and image corpora. The study of data containing linked text and images, such as newspapers with embedded figures or television shows with associated closed captions, is of particular interest. Research products take on several forms: book length manuscripts, technical reports, new software implementations, and digital projects intended for broad public consumption.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m8d9940499205efb9017086b472d9b988|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2623 842 6927|
|Meeting number and password:||Meeting number (access code): 2623 842 6927
Meeting password: ptMvKCCx536
|Date and time:||Wednesday, March 30, 2022, 4:00 pm EDT, 1-hour duration|
LAN LIU, ASSOCIATE PROFESSOR, DIRECTOR OF THE CONSULTING CENTER, SCHOOL OF STATISTICS, UNIVERSITY OF MINNESOTA
THE INNER PARTIAL LEAST SQUARE – A PROBE INTO THE “NECESSARY” DIMENSION REDUCTION
The partial least square (PLS) algorithm retains the combinations of predictors that maximize the covariance with the outcome. The Fisherian interpretation of PLS remained a mystery until Cook et al. (2013) showed that it results in a predictor envelope, which is the smallest reducing subspace of Σ X that contains the coefficient. This paper is motivated by findings after making a seemingly-trivial change to the PLS: what if we change the max in PLS to min? Counterintuitively, this does not calculate the complement of the traditional PLS space. Instead, it results in a new space: the largest reducing subspace of Σ X that is contained in the coefficient matrix space. We define the modified PLS as the inner PLS and the resulting space as the inner predictor envelope space. Unlike the traditional PLS that removes irrelevant information, the inner PLS incorporates the knowledge that some information is purely relevant. Consequently, the inner PLS algorithm can lead to a more efficient regression estimator than the PLS in certain scenarios; however, it is not the most efficient under the inner predictor envelope model. Therefore, we derive the maximum likelihood estimator and provide a non-Grassmannian optimization technique to compute it. We confirm the efficiency gain of our estimators both in simulations and real-world data from the China Health and Nutrition survey.
Bio: Dr. Lan Liu is an Associate Professor of Statistics and Director of the Consulting Center at the University of Minnesota. Her research interests include causal inference, missing data analysis, clinical trials, doubly robust inference, Bayesian analysis, surrogate outcomes, measurement error, mediation analysis, social network, personalized medicine, unmeasured confounder, and statistical consulting. Dr. Liu received her PhD in Biostatistics from the University of North Carolina at Chapel Hill in 2013. Before joining the University of Minnesota, she was a postdoc fellow at Harvard University.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m5229f1a2b5e7f0ef2dcf9e34260e62aa|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2621 020 9119|
|Meeting number and password:||Meeting number (access code): 2621 020 9119
Meeting password: eTA6mBBen46
|Date and time:||Wednesday, April 6, 2022, 4:00 pm EDT, 1-hour duration|
POINT PROCESS MODELS FOR SEQUENCE DETECTION IN NEURAL SPIKE TRAINS
Sparse sequences of neural spikes are posited to underlie aspects of working memory, motor production, and learning. Discovering these sequences in an unsupervised manner is a longstanding problem in statistical neuroscience. I will present our recent work using Neyman-Scott processes—a class of doubly stochastic point processes—to model sequences as a set of latent, continuous-time, marked events that produce cascades of neural spikes. Bayesian inference in this model requires integrating over the set of latent events, akin to inference in mixture of finite mixture (MFM) models and Dirichlet process mixture models (DPMMs). I will show how recent work on MFMs can be adapted to develop a collapsed Gibbs sampling algorithm for Neyman-Scott processes. Finally, I will present an empirical assessment of the model and algorithm on spike-train recordings from songbird HVC and rodent basal ganglia, which suggests novel connections between sequential activity in the brain and the generation of natural behavior.
Bio: Dr. Linderman is an Assistant Professor of Statistics and, by courtesy, Electrical Engineering and Computer Science at Stanford University. He is also an Institute Scholar in the Wu Tsai Neurosciences Institute and a member of Stanford Bio-X and the Stanford AI Lab. Previously, he was a postdoctoral fellow with Liam Paninski and David Blei at Columbia University, and he completed his PhD in Computer Science at Harvard University with Ryan Adams and Leslie Valiant. Dr. Linderman obtained his undergraduate degree in Electrical and Computer Engineering from Cornell University and spent three great years as a software engineer at Microsoft before graduate school.
Dr. Linderman’s research is focused on machine learning, computational neuroscience, and the general question of how computational and statistical methods can help decipher neural computation. His work aims at developing rich statistical models for analyzing neural data.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=mae62f87e81d4597b82108acd44bae063|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2621 500 7841|
|Meeting number and password:||Meeting number (access code): 2621 500 7841
Meeting password: BfEngm7Qq33
|Date and time:||Wednesday, April 13, 2022, 4:00 pm EDT, 1-hour duration|
CHUAN-FA TANG, ASSISTANT PROFESSOR, DEPARTMENT OF MATHEMATICAL SCIENCES, UNIVERSITY OF TEXAS AT DALLAS
TAYLOR’S LAW FOR SEMIVARIANCE AND HIGHER MOMENTS OF HEAVY-TAILED DISTRIBUTIONS
The power law relates the population mean and variance is known as Taylor’s law, proposed by Taylor in 1961. We generalize Taylor’s law from the light-tailed distributions to heavy-tailed distributions with infinite mean. Instead of population moments, we consider the power-law between the sample mean and many other sample statistics, such as the sample upper and lower semivariance, the skewness, the kurtosis, and higher moments of a random sample. We show that, as the sample size increases, the preceding sample statistics increase asymptotically in direct proportion to the power of the sample mean. These power laws characterize the asymptotic behavior of commonly used measures of the risk-adjusted performance of investments, such as the Sortino ratio, the Sharpe ratio, the potential upside ratio, and the Farinelli-Tibiletti ratio, when returns follow a heavy-tailed nonnegative distribution. In addition, we find the asymptotic distribution and moments of the number of observations exceeding the sample mean. We propose estimators of tail-index based on these scaling laws and the number of observations exceeding the sample mean and compare these estimators with some prior estimators.
Bio: Dr. Chuan-Fa Tang is an Assistant Professor in the Department of Mathematical Sciences at the University of Texas at Dallas. His research interests include order-restricted inference, shaped-constrained inference, empirical processes, empirical likelihood, survival analysis, mathematical statistics, image processing, kernel smoothing, and model selection. Dr. Tang received his PhD in Statistics from the University of South Carolina in 2017.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m93906b00ba170111c152cdb5ce9ee0f3|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2624 258 0310|
|Meeting number and password:||Meeting number (access code): 2624 258 0310
Meeting password: MsC65M2X6U2
|Date and time:||Wednesday, April 20, 2022, 4:00 pm EDT, 1-hour duration|
DYNAMIC RISK PREDICTION TRIGGERED BY INTERMEDIATE EVENTS USING SURVIVAL TREE ENSEMBLES
With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to conventional landmark prediction with fixed landmark times, our methods allow the landmark times to be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue of model incompatibility at different landmark times. In our framework, both the longitudinal predictors and the event time outcome are subject to right censoring, and thus existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we propose a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.
Bio: Dr. Yifei Sun is an Assistant Professor in the Department of Biostatistics at Columbia University. She completed her PhD in biostatistic at Johns Hopkins University in 2015. Dr. Sun’s methodological interest lies in biostatistical methodology, statistical learning in survival and longitudinal data analysis and their applications in medicine and epidemiology such as electronic health records and precision medicine.
|Event address for attendees:||https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=m8bb6c92432c2eaf47454ab9476bc40e6|
|Call-in option:||US Toll +1-415-655-0002 Access code: 2620 182 8603|
|Meeting number and password:||Meeting number (access code): 2620 182 8603
Meeting password: hrErWV2zq33
|Date and time:||Wednesday, April 27, 2022, 4:00 pm EDT, 1-hour duration|