Why Interdisciplinary Data Science?

The Duke University Master in Interdisciplinary Data Science (MIDS) is home for creative problem-solvers who want to use data strategically to advance society. We are cultivating a new type of quantitative thought leader who uses disruptive computational strategies to generate innovation and new insights.

MIDS combines rigorous computational and technical training with field knowledge and repeated practice in critical thinking, teamwork, communication, and collaborative leadership to generate data scientists who can add value to any field.

All fields need data scientists

You want to use data to advance the industry or field you are most passionate about.

Effective data scientists need depth and breadth

Data scientists who have proficiency in diverse ways of thinking and deep content expertise innovate most effectively.

World problems require data scientists with diverse backgrounds

Data analysis and creative engineering need to be integrated with nuanced domain knowledge, collaborative leadership, and effective management skills to harness data’s full potential.

There has never been a better time to be a data scientist at Duke, and I'm delighted to be able to partner with departments to deliver the interdisciplinary curriculum that's a Duke signature.

Robert Calderbank, Director, iiD

I'm thrilled about the network of partnerships around the university that will distinguish this degree as one that allows students to take data science into many different domains.

Tom Nechyba, Director, SSRI

Program Overview

Photo of Jana Schaich Borg
Jana Schaich Borg
Faculty Director, MIDS

Duke MIDS is a two-year program designed to help meet the need for knowledgeable data scientists who can answer important questions with data-backed insights.

All MIDS students complete a set of eight core courses that cover critical topics in statistics, machine learning, database management, data wrangling, data communication, analytical thinking, team management, and ethics. The core courses are designed to fit together as a cohesive set of learning experiences, and ensure that students have repeated practice interpreting and reporting the results of analyses on real data sets.

Accompanying these core courses, students choose a set of approximately eight electives that deepen their expertise in a methodological or domain area.

Students culminate their experience with a capstone project that will be completed over the course of at least one year with mentoring from Duke’s world-class faculty.

Why Duke MIDS?

World-renowned faculty

Work with Duke’s elite faculty in fields across the university, including computer science, statistics, math, economics, political science, sociology, medicine, neuroscience, law, and history.

Personalized pathways

Experience the full range of the data science ecosystem and graduate as an expert in at least one analytical approach or branch of technology—you decide where to pursue breadth vs. depth of knowledge.

Diverse student body

Study and collaborate with talented and passionate students of different ages, backgrounds, and skill sets.

Comprehensive training

Develop quantitative acumen, technical expertise, domain knowledge, leadership skills, and project management experience all in one program.

Collaboration across disciplines

Graduate knowing how to work with any kind of group, and being able to explain the actionable significance of analyses to any kind of audience.

Critical thinking about real problems

Practice applying data science concepts to contemporary problems throughout the two years of the curriculum, so that you graduate being able to think through any computational challenge logically and methodically.

MIDS Courses

DATA COLLECTION METHODS IN SURVEY RESEARCH

UNC/Odum Institute and Duke/SSRI
Davis Library 219 / Gross Hall 230E
Tuesdays 2:00 – 4:45 PM
Instructor: Doug Currivan

Soci 760 (UNC) / IDS 690-01 (Duke)​

Start date: Fall 2017


This course examine the effects of key survey design decisions on data quality, concentrating on the impact of modes of data collection on coverage error, nonresponse error, and measurement error. The course focuses on advances in computer assisted methodology and comparisons among various data collection methods (such as telephone versus face to face, paper versus computer assisted, interviewer administered versus self-administered).  It will also examine the literature on interviewer effects, including literature related to the training and evaluation of interviewers.  The course will also review current literature on the reduction of nonresponse and the impact of nonresponse on survey estimates.

This course meets Tuesdays from 2:00 – 4:45 p.m. and will alternate between the two campuses, with connections via interactive video allowing students to choose between attending on their home campus or traveling between Duke and UNC to attend sessions at both campuses in person.  The course is taught via a traditional interactive presentation and discussion format.

Year 1 Fall
1st half2nd half
Data to Decision
Modeling and Representation of Data
Optional mini-courses or mini-projectData Wrangling and Introduction to Text Analysis
Elective
(Data Seminar)
Year 1 Spring
1st half2nd half
Principles of Machine Learning
Data Mangement Systems
Data Logic, Visualization,
and Storytelling
Data Science Ethics
Elective
(Data Seminar)
Year 2 Fall
1st half2nd half
Capstone
Elective
Elective
Elective
(Data Seminar)
Year 2 Spring
1st half2nd half
Capstone
Elective
Elective
Elective
(Data Seminar)

Over the summer, all students will be given access to online materials that review the concepts of probability, distributions, statistical testing, and linear algebra that are critical to understand before beginning the MIDS core courses.

The online materials will also include materials that ensure that students have intermediate proficiency in Python and basic proficiency in R before their first courses in the Fall. The goal of this online review is to make sure all students are confident in their foundational statistical and technical skills before they arrive on campus.

This course will introduce students to the exciting process of using data to make practical decisions.  Students will work in small groups to analyze and understand the implications of real, messy data sets.

Teams will design and implement their own analysis plan in order to recommend a strategy for solving problems such as detecting credit card fraud, diagnosing a medical disease, or identifying the type and location of a server failure in a network.

By giving students first-hand experience with the full life cycle of a data analysis project at the beginning of the curriculum, this course will help students see for themselves why the techniques they will learn in other courses in the curriculum are so useful and important.

In order to complete the projects, students will also get a practical survey of probability theory through Bayes’ Theorem, Maximum Likelihood, and Information Measures, and have their first introduction to Binary Classification and Linear Regression.

The past decade has witnessed an explosion of textual data produced by websites, social media platforms, digitization of administrative and historical records, and new monitoring technologies.

This course will introduce students to the rich opportunities for using these new forms of data to gain insights. Students will learn how to gain access to textual data through APIs, and how to prepare these data for analysis.

The MIDS summer online review ensures that all students understand how to apply fundamental statistical analysis tools, such as descriptive statistics, ANOVA, regression, and the most commonly-used non parametric statistical tests. 

This course moves beyond these fundamental techniques to teach students how to approach problems when assumptions of fundamental statistical analysis tools are broken. Students will learn techniques for dealing with outliers, missing data, and non-normal data distributions.  Students will also learn approaches for drawing causal inferences, and frameworks for modeling highly correlated and structured data, such as generalized linear modeling. 

The lessons learned in this course will help students critically about the issues that affect the success of models in data science, and serve as the foundation for more in-depth study about techniques for addressing critical issues in real data sets.

The course will teach students the principles of writing algorithms that dynamically learn rules from data to forecast future trends or classify future outcomes. 

Students will learn the theory behind getting computers to learn from data and improve performance with feedback, and will also continuously practice applying the theory to real problems using a structured framework. 

Topics covered include linear discriminant analysis, k-nearest neighbor classification, decision trees, random forests, clustering, unsupervised learning, and principles of optimization. Students will learn how to correctly train their models using cross-validation and bootstrapping, and assess the success of their models using appropriate metrics. 

Students will also be introduced to regularization concepts and the principles behind deep learning. Students will complete the course being able to apply machine learning tools rigorously, and prepared to dig into specific aspects of machine learning more deeply in elective courses.

The most time-consuming part of a data science project is acquiring the data you need. This course will teach students how to work with and retrieve data from different storage modalities.  Students will gain an appreciation for the diversity of the rapidly-changing data management landscape, and the concepts behind what differentiates different data storage systems.

Students will learn about normalization theory and relational database systems, practice using ER models and relational schemas, and generate complex SQL theories. 

Students will also be introduced to MapReduce techniques in Hadoop and Spark, NoSQL databases such as MongoDB, and cloud-based services such as AWS.  Students will complete the course with a firm grasp of the power of data management systems and how they impact the success of data science work.

One of the most critical skills of an effective data science is the ability to understand and communicate the implications of a data analysis.  In this course, students will be instilled with the importance of the skill, and will be introduced to the communication principles that will be practiced and honed throughout the rest of the program. 

Students will cultivate the ability to think critically and skeptically about the questions they need to answer in a data project and the strategies they are using to answer them. Students will also learn the principles behind effective data visualization and how to implement them in real analyses using Tableau software. 

Finally, students will practice presenting the results of a data analysis to diverse target audiences. The lessons learned in this course will serve as the foundation for workshops students will participate in throughout their second year and Capstone Project.

Data Science tools are not morally neutral. Almost all the tools we create in data science influence society in a meaningful way, and almost all the data models we implement make assumptions about society that may or may not represent reality. 

This course is designed to help students think explicitly about the larger questions of what it is they are building and analyzing, and to learn how to identify and predict the impact of their work on the greater world.  Using contemporary case studies from recent issues in the news or legal system, students will learn about issues such as intellectual copyright, consent, data security, differences between privacy vs. confidentiality, the difficulties of anonymization, and bias in AI. The goal of this course is for students to emerge with an appreciation of their social responsibility, and the skills to apply this responsibility to the innovative projects they will be completing throughout their careers.

This course is comprised of a series of discussions and panels that give students snapshots of data science projects in progress. Students will hear from speakers from academia, industry, government, and nonprofits who will talk about their experiences and discoveries.  Speakers will talk about their career paths and share personal stories about topics like the impacts of stakeholders, the challenges of data collection, the differences between data science in startups vs. established companies, and the cultures in different fields. This seminar is meant to help prepare students for the experiences they will have after they leave Duke.

Capstone Projects are one of the most critical components of the MIDS program. The goal for these year-long Capstones is for students to be integrated into world-class interdisciplinary research projects that can solve real-life problems and be significantly advanced through data science.  Capstones will have detailed oversight from Duke faculty in diverse departments across Duke with research interests and expertise aligned with the project. 

However, each MIDS student must achieve a specific outcome of interest for an outside party (such as a company, government agency, or nonprofit) as part of the greater research and give a final presentation with an accompanying white paper about the implications of that outcome. To ensure MIDS students complete their projects successfully, they will attend workshops and complete assignments throughout the second year that provide guidance, practice, and feedback about students’ teamwork, project management, communication plan, and overall progress in relation to the project. 

The final deliverables will be evaluated by MIDS core faculty and relevant outside stakeholders on multiple dimensions including students’ ability to communicate effectively to a diverse audience, computational strategy, and creativity.

Who Should Apply?

MIDS is open to all applicants who demonstrate a passion for data analysis, a mastery of analytical reasoning, an aptitude for learning quantitative and technical skills, and compelling academic or professional achievement.

We welcome applicants of any age and background, including (but not limited to) recent college graduates with quantitative majors, database engineers who have been in the IT field for years, government professionals who want to integrate data science into federal or local offices, health care professionals interested in informatics, and journalists who want to incorporate data mining into their investigative skills.

Due to our comprehensive approach, our application process requires applicants with primarily quantitative backgrounds to demonstrate their commitment to excelling in the problem-solving, communication, and team-building aspects of data science. Likewise, applicants without quantitative backgrounds are asked to demonstrate their commitment to learning quantitative concepts and skills quickly through mechanisms like online classes or recommendations from colleagues with strong quantitative track records.

We provide resources for students to review and learn critical concepts and skills before beginning the core courses, so that all students can begin the core courses on a level playing ground.

Frequently Asked Questions

Anyone interested in advancing their career or changing career paths by developing interdisciplinary skills is encouraged to apply.

No, we do not have an online program.

We anticipate many of our students will have work experience, but it is not required.

Applicants from a range of academic backgrounds will be considered.

GRE scores are required. If English is not the first language, TOEFL scores are required. Applicants with a graduate degree are exempt from GRE requirements.

Yes, MIDS is a full-time degree program and qualifies for a visa. International applicants are encouraged to apply as early as possible in order to allow ample time to clear the student visa process. Non-citizens residing in the U.S. are encouraged to apply early as well. Applications can (and should) be submitted in advance of supporting documents, such as recommendation letters, transcripts, and language test scores.

Applicants will apply directly through the Duke Graduate School: https://gradschool.duke.edu. Application is now open for 2018.