MIDS Electives
A key component of the MIDS program is the opportunity for students to take electives from across the university to help tailor their educational experience at Duke to their personal career interests. All MIDS students are required to complete four electives during their time at Duke to graduate (please be sure to read the fine print below), but may take more.
To count towards your elective requirement, all electives must be relevant to your career interests as a data scientist. But because MIDS is an interdisciplinary program and our students have very diverse interests, we recognize that “relevant to your career” may mean very different things to different students.
With that in mind, it can be helpful to think of electives as being divided into two categories: courses that are obviously relevant and do not require special permission, and courses for which we ask that you submit a written justification for why a given course is important for your career goals.
Clearly Data Science Relevant Courses
Most electives taken by MIDS students fall into the category of “obviously data science relevant,” such as:
- Math courses on linear algebra, statistics, algorithms, or probability
- Statistics courses on statistical modelling or data analysis
- ECE courses on machine learning, deep learning, software engineering, or computer vision
- Computer science courses on programming, algorithms, databases, or machine learning
- Nicholas School courses on geospatial analysis (GIS)
- Biostatistics courses on programming, data analysis, causal inference, or AI in health.
- Sociology courses on network analysis
- Political science courses on quantitative survey methods, Bayesian statistics, or causal inference
- Up to two courses in Innovation and Entrepreneurship (I&E). Those interested in taking more than two courses from I&E to meet elective requirements should speak with program staff.
- Up to two courses on (data science or technology related) ethics, and/or public policy. Those interested in taking more than two ethics/public policy courses to meet elective requirements should speak with program staff.
Other Electives
If you are interested in counting any other courses towards your MIDS elective requirement — or are interested in counting more than two classes from the specially noted categories in the list above — then please submit a one-page memo to the MIDS program staff detailing how you feel the requested courses would support your career or educational goals.
To be clear, our goal in asking that you submit justification for other courses is not to discourage you from taking other courses as MIDS electives. Rather, our goal is to ensure that we are able to advise students interested in less traditional selections appropriately, and that you are giving your decisions appropriate consideration.
Details of MIDS Elective Requirement
All MIDS students are required to take at least four (4) semester-long electives that count for at least 2 credits, and they must complete a minimum of 12 credits of electives overall. Courses MUST be graded (e.g. not Credit/No Credit, and not audited) in order to count as an elective.
One exception to this rule is that students may count two 1.5 credit IDS-listed mini-courses in place of a full-credit elective. This policy only applies to IDS-listed mini-courses; 1.5 credit mini courses listed in other departments may be eligible for this policy depending on mini course content — please speak to the MIDS staff if you have a mini course from another department you wish to count towards your electives before taking the course.
Most Popular Electives
ECE 661: Computer Engineering Machine Learning and Deep Neural Nets
This course examines various computer engineering methods commonly performed in developing machine learning and deep neural network models. The focus of the course is on how to improve the training and inference performance in terms of model accuracy, size, runtime, etc. Techniques that are widely investigated and adopted in industrial companies and academic communities will be discussed and practiced. Programming practices on these techniques are designed with heavy utilization of the PyTorch package. Prerequisites: Computer Science 201 or ECE 551D or ECE 751D.
STA 602: Bayesian Statistical Modeling and Data Analysis
Principles of data analysis and modern statistical modeling. Exploratory data analysis. Introduction to Bayesian inference, prior and posterior distributions, hierarchical models, model checking and selection, missing data, introduction to stochastic simulation by Markov chain Monte Carlo using a higher level statistical language such as R or Matlab. Applications drawn from various disciplines. Not open to undergraduate students or students who have taken Statistical Science 360. Recommended prerequisite: Statistical Science 611 or the following: Statistical Science 210 and (Statistical Science 230 or 240L) and (Mathematics 202, 202D, 212, or 222) and (Mathematics 216, 218, or 221, any of which may be taken concurrently).
ECE 685D: Intro to Deep Learning
Provides an introduction to the machine learning technique called deep learning or deep neural networks. A focus will be the mathematical formulations of deep networks and an explanation of how these networks can be structured and ‘learned’ from big data. Discussion section covers practical applications, programming, and modern implementation practices. Example code and assignments will be given in Python with heavy utilization of PyTorch (or Tensorflow) package. The course and a project will cover various applications including image classification, text analysis, object detection, etc. Prerequisite: ECE 580, ECE 681, ECE 682D, Statistical Science 561D, or Computer Science 571D.
BIOSTAT 823: Statistical Programming for Big Data
This course will extend the foundation laid in software tools for data science to allow for efficient computing involving very large data sets. This course will explore the use appropriate algorithms and data structures for intensive computations, improving computational performance by use of native code compilation, use of parallel computing to accelerate intensive computations, use appropriate algorithms and data structures for massive data set, and use of distributed computing to process massive data sets. Prerequisite: BIOSTAT 821 or permission of the Director of Graduate Studies. Credits: 2
ECE 590-1: Theory and Practice of Algorithms
This course ties the mathematical theory of algorithms and graphs to their practical implementations. Students will learn about the mathematical structures that for the foundations for the behavior and analysis of algorithms from a variety of domains, with a particular emphasis on graphs. Students will also tie that theory to practice by writing code to implement those algorithms, and comparing experimentally observed runtimes to those projected by the mathematical theory.
MATH 641: Probability
Designed to be a sequel to Statistical Science 711. The basic five topics are: martingales, Markov chains from an advanced viewpoint, ergodic theory, Brownian motion and its applications to random walks, Donsker’s theorem and the law of the iterated logarithm, and multidimensional Brownian motion, connection to PDE’s. For those who have not had 711, we will prove the law of large numbers using martingales and obtain versions of the central limit theorem from Donsker’s theorem. Course requires a knowledge of measure theory. Prerequisite: Statistical Science 711 or Mathematics 631.
STA 663L: Statistical Computing and Computation
Statistical modeling and machine learning involving large data sets and challenging computation. Data pipelines and data bases, big data tools, sequential algorithms and subsampling methods for massive data sets, efficient programming for multi-core and cluster machines, including topics drawn from GPU programming, cloud computing, Map/Reduce and general tools of distributed computing environments. Intense use of statistical and data manipulation software will be required. Data from areas such as astronomy, genomics, finance, social media, networks, neuroscience.
BIOSTAT 821: Software Tools for Data Science
A data scientist needs to master several different tools to obtain, process, analyze, visualize and interpret large biomedical data sets such as electronic health records, medical images, and genomic sequences. It is also critical that the data scientist masters the best practices associated with using these tools, so that the results are robust and reproducible. The course covers foundational tools that will allow students to assemble a data science toolkit, including the Unix shell, text editors, regular expressions, relational and NoSQL databases, and the Python programming language for data munging, visualization and machine learning. Best practices that students will learn include the Findable, Accessible, Interoperable and Reusable (FAIR) practices for data stewardship, as well as reproducible analysis with literate programming, version control and containerization.
Prerequisite: Permission of the director of graduate studies
ECE 551D: Programming, Data Structures, and Algorithms in C++
Students learn to program in C and C++ with coverage of data structures (linked lists, binary trees, hash tables, graphs), Abstract Data Types (Stacks, Queues, Maps, Sets), and algorithms (sorting, graph search, minimal spanning tree). Efficiency of these structures and algorithms is compared via Big-O analysis. Brief coverage of concurrent (multi-threaded) programming. Emphasis is placed on defensive coding, and use of standard UNIX development tools in preparation for students’ entry into real world software development jobs.