Health

student having their blood pressure checkedFaculty

Andrea Lane is an Assistant Professor of the Practice at the Duke Social Science Research Institute (SSRI), with a secondary appointment in the Department of Statistical Sciences. Her work focuses on the intersection of data science and health, with a particular interest in using electronic health records and community-engaged research to better understand and improve health outcomes. Through both her research and teaching, she helps students see how data science can be applied in meaningful ways to real-world health challenges.

Students

Developing an LLM-Based Tool for Data Cleaning and Analysis
Students: Günel Aghakishiyeva, Rui (Katherine) Tian, Poojitha Balamurugan, Revanth Chowdary Ganga, and Udyan Sachdev
Partner: Duke Department of Anesthesiology | Industries: Health, Research
Our students partnered with Duke Health’s Anesthesiology team to make it easier for clinicians to work with complex patient data. The team built an automated data processing pipeline that transforms raw, messy data into analysis-ready formats, handling tasks like standardization, missing data, and validation. They also developed a user-friendly interface powered by large language models, allowing non-technical users to run analyses and generate insights without needing to code. By streamlining what was once a time-intensive and costly process, the tool reduces data preparation from months to minutes, helping healthcare professionals access reliable insights more quickly and efficiently.

Address Verification for Healthcare Provider Data
Students: Ya-Yun Huang, Ahmed Ibrahim, Ruixin Lou, Ying-Chih (Emma) Wang
Partner: Orderly Health | Industry: Health
Students worked with Orderly Health to improve the accuracy and efficiency of healthcare provider data. Orderly’s platform helps patients, insurers, and health tech companies find reliable provider information, but verifying details like addresses often required time-consuming manual calls. The team developed a machine learning approach to evaluate the reliability of sources identified by Orderly’s algorithm, using features like website structure, contact information, and external links. After testing multiple models, their solution improved prediction accuracy and introduced confidence scores to help prioritize verification efforts. This work helps reduce costs, streamline data validation, and make accurate healthcare information more accessible.

Alumni

Robert Wan (MIDS ’23) is currently pursuing a PhD in Biostatistics at Emory’s Rollins School of Public Health, where he’s building on his passion for data-driven research in health. Before returning to academia, Robert worked at Mastercard Data & Services as an Associate Manager in analytics consulting, leading teams that supported Fortune 500 clients across retail, consumer goods, and financial services. His work focused on business testing, campaign measurement, and delivering actionable insights to guide strategic decisions. With a strong foundation in data science from Duke and a background in economics and statistics from Northwestern, Robert’s path highlights how MIDS graduates can move between industry and research while applying data science to real-world challenges.

Projects

Health-Focused Capstone Projects
MIDS students also take on capstone projects that explore how data science can improve health outcomes and support better decision-making in medicine and public health.

Social Drivers of Health and Child Well-Being
Students: Emmanuel Ruhamyankaka, Huiying Lin, Xinyi Sheng, and Bei Yu 
Partner: Children’s Health & Discovery Initiative, Duke Department of Pediatrics | Industry: Health
In this project, students examined how community-level factors like housing instability, unemployment, and access to resources shape child well-being and safety. Using electronic health records from Duke-affiliated hospitals alongside Census data, the team mapped cases of child maltreatment to neighborhood characteristics across Durham. Their work uncovered important patterns linking social determinants of health to child outcomes, helping highlight where interventions and resources may be most needed.

Developing a Question Answering Model for Critical Care Medicine
Students: Yun-Chung (Murphy) Liu, Keon Nartey, Suim Park, and Zhan (Bob) Zhang
Partner: Duke Department of Medicine | Industries: Health, Research
This project focused on helping physicians navigate the overwhelming volume of medical research needed to make critical care decisions. Students built a question-answering tool powered by large language models that can retrieve relevant studies and generate clear, concise summaries. Using a knowledge graph and retrieval-augmented generation approach, the model helps synthesize complex and sometimes conflicting research, giving clinicians faster access to the insights they need to make informed decisions.

Courses

Featured Courses in Health Data Science

Health Data Science (STA 598/IDS 597)
This course introduces key statistical concepts through real-world health research. You’ll work with a variety of health data sources and learn how to apply methods like survival analysis, survey analysis, and causal inference to answer meaningful questions. The focus is on building practical skills that translate research questions into strong analyses, addressing bias, and clearly communicating results that can inform decisions in healthcare and public health.

Bayesian Health Data Science (BIOSTAT 725/STA 725)
This course takes a deeper dive into analyzing biomedical data using Bayesian methods. Working with data like electronic health records, wearables, and imaging, you’ll learn how to model complex problems using approaches like hierarchical modeling and Bayesian machine learning. Along the way, you’ll build computational skills in R and Stan and gain experience tackling challenges like missing data, large datasets, and high-dimensional analysis, preparing you to work on cutting-edge problems in health and biomedical research.