Past Workshops – Fall 2017 Semester

Information about past workshops is available here.

Exploratory Data Analysis and Visualization in R


Exploratory Data Analysis (EDA) is a critical first step of data analysis. Here are some reasons why we should use EDA:

– Detecting mistakes and data cleaning.
– Shedding lights on preliminary selection of appropriate analysis methods.
– Exploring relationships among predictors and outcome variables.

This workshop will introduce several useful exploratory data analysis methods and visualization tools in R. Participants can apply these methods and tools using an insurance claim dataset we provide. No experience with coding or the R language is required.

Date: Wednesdays, November 1, 2017, 4:00PM – 5:30PM

Applied Survey Data Analysis: Methods and Implementation in R/SPSS


A lot of clients came to us for help on data that was collected from questionnaires/surveys, which reminds us the importance and usefulness of the topic – Analysis of Survey Data. We’ll start with data cleaning, briefly discuss several kinds of missing values, and imputation methods for the miss values, such as hot deck imputation, predictive mean matching and multiple imputation. Then, we’ll show the methods to check the reliability and validity of survey information, including definitions, different evaluation methods, and remedies for poor reliability. Inferences from survey data will be discussed in the last part, including topics of checking whether a result can be extended to population, and some useful categorical data analysis methods. For each part, we’ll show how to use R/SAS to conduct these calculations with some real survey data illustrations.

Date: Wednesdays, November 8, 2017, 4:00PM – 5:30PM

Variable Selection with Demos in R


The standard linear model is commonly used to describe the relationship between a response and a set of variables (predictors). It is often the case that some or many of the variables used in a linear model are in fact not associated with the response. Including such irrelevant variables leads to unnecessary complexity in the resulting model, making it more difficult to interpret. In this workshop, we will cover two types of variable selection approaches, subset section and shrinkage, which can yield better prediction accuracy and model interpretability. Various examples with demos in R will be provided to illustrate a more concrete idea of when and how one should apply each method.

Date: Wednesdays, November 15, 2017, 4:00PM – 5:30PM