2024 Capstone Projects
As a strategic data initiative with AB InBev, our team was tasked with exploring the use of alternative data to enhance their Business-to-Business (B2B) product recommendation systems. Traditionally, AB InBev relied heavily on internally generated data—such as customer purchase activity—for these predictions. We were brought in to investigate whether novel,...
The Federal Open Market Committee (FOMC) sets the target for the federal funds rate, a critical tool influencing interest rates, inflation, and the broader economy. What if we could use large language models (LLMs) to anticipate their next decision? In partnership with Bank of New York (BNY), our project developed...
Let’s make vacation accessible for everyone–that’s the mission of Becoming rentABLE, and it becomes ours too. Our objective focused on helping Becoming rentABLE to expand their inventory of accessible short-term rental rental across 18 target U.S cities. To achieve this, we analyzed Airbnb text review data and used a RoBERTa-based...
ChessMindAI is a platform focused on improving chess education by combining advanced artificial intelligence with classic gameplay. Our current project explores how AI can assist in writing expert-level commentary for a chess textbook by generating insightful explanations for individual board positions. To achieve this, we use a custom, agentic workflow...
This project analyzes customer satisfaction at Citizens Bank using Net Promoter Score (NPS) survey data, focusing specifically on customers’ likelihood to recommend the bank to others. By combining survey responses with customer account data, transaction history, and banking behaviors, we identified key factors that drive customer satisfaction and loyalty. Our...
Citizens Bank wanted to understand the impact of different marketing channels on sales while accounting for macroeconomic factors and regional presence. To address this, our team developed a Marketing Mix Model (MMM) that categorized banking products based on similar sales trends and built separate models for each category. The approach...
This project is an enterprise-grade log monitoring and anomaly detection system developed to support Data Oceans and their clients in identifying unusual patterns across large-scale transactional and system log data. The solution leverages a combination of unsupervised machine learning models (such as Isolation Forest and K-means) and statistical methods to...
In our capstone project we implemented unsupervised machine learning clustering methods to segment users of the real-estate website Monopolio to provide insights for marketing purposes. Our client, DD360, has a new website that presents real estate opportunities for rent and purchase in Mexico, where users can browse through different neighborhoods...
Duke Health’s Anesthesiology team faced a significant challenge in analyzing vast amounts of patient-related data due to the limited technical skills of their team members. This limitation forced them to rely heavily on scarce statisticians and programmers for data cleaning and processing, a process that was both time-consuming and expensive,...
Critical care providers need to make high-stake clinical decisions on a daily basis. Making research-informed decisions is challenging because of the sheer amount of research results published in the field everyday with potentially conflicting conclusions regarding the same clinical questions. This project aims to develop a medical question answering model,...
Buildings account for approximately 9% of global greenhouse gas emissions each year, yet emissions data for the building sector is often outdated or too low-resolution to support effective policymaking. Climate TRACE—a global nonprofit coalition that independently monitors emissions across 83% of global sources—seeks to fill this gap. As part of...
This project develops a multi-stage framework to support Duke Women’s Soccer recruitment through advanced modeling and a web-service platform. In the first-stage of modeling, detailed game-level player statistics from the Wyscout API are transformed through extensive aggregation and feature selection to form the final set of average, percentage, and versatility...
Subsalt is experiencing high computational costs from generating legally de-identified synthetic data by training multiple generative models across various configurations. Many of these configurations are not utilized, contributing to the inefficiencies. These inefficiencies lead to unnecessary expenses, especially from running costly privacy tests on different models and configurations. To tackle...
Conventional object detection models like YOLO often struggle with occluded objects due to reliance on single viewpoints and limited scene context. In our capstone project, we benchmarked BEVFormer, a transformer-based model which using bird’s-eye view representation, against YOLO on the NuScenes dataset. The NuScenes dataset contains diverse urban scenes with...
Showing 51-67 of 67 results