Modeling and Representation of Data

  • Required
  • IDS 702
  • Credits: 3

Statistical models are necessary for analyzing the type of multivariate (often large) datasets that are usually encountered in data science. In this course, you will learn the general work flow for building statistical models and using them to answer inferential questions. You will learn several parametric models such as generalized linear models, models for multilevel data and time series models. You will also learn to handle messy data, including data with missing values, erroneous values, or outliers, and if time permits, data with non-standard distributions. You will be able to assess model fit, validate model assumptions and more generally, check whether proposed statistical models are appropriate for any given data. We will also cover a brief introduction to causal inference under the potential outcomes framework. Should time permit, we may briefly cover nonparametric models such as classification and regression trees.