Developing an LLM-based tool for easy data cleaning and analysis for Duke Anesthesiology

Duke Health’s Anesthesiology team faced a significant challenge in analyzing vast amounts of patient-related data due to the limited technical skills of their team members. This limitation forced them to rely heavily on scarce statisticians and programmers for data cleaning and processing, a process that was both time-consuming and expensive, often taking up to a year and incurring high costs. To address this issue, we developed an automated data processing pipeline that transformed raw, messy data into a format suitable for analysis. This pipeline included steps such as format standardization, missing data handling, duplicate removal, normalization, and data validation.

We created a user-friendly interface, enhanced with Large Language Model (LLM) support, to empower non-technical team members to conduct analyses directly. This platform significantly reduced their dependence on technical staff, streamlining the analysis process and enhancing efficiency. The use of LLMs allowed for query standardization and provided reliable insights, making it easier for healthcare professionals to analyze data without extensive coding knowledge. By automating data cleaning and analysis, we reduced preprocessing time from months to minutes, enabling healthcare professionals to analyze data effortlessly and obtain stable, trustworthy AI-powered insights.

Mentor: Andrea Lane