All Capstone Projects
ChessMindAI is a platform focused on improving chess education by combining advanced artificial intelligence with classic gameplay. Our current project explores how AI can assist in writing expert-level commentary for a chess textbook by generating insightful explanations for individual board positions. To achieve this, we use a custom, agentic workflow...
Buildings account for approximately 9% of global greenhouse gas emissions each year, yet emissions data for the building sector is often outdated or too low-resolution to support effective policymaking. Climate TRACE—a global nonprofit coalition that independently monitors emissions across 83% of global sources—seeks to fill this gap. As part of...
Conventional object detection models like YOLO often struggle with occluded objects due to reliance on single viewpoints and limited scene context. In our capstone project, we benchmarked BEVFormer, a transformer-based model which using bird’s-eye view representation, against YOLO on the NuScenes dataset. The NuScenes dataset contains diverse urban scenes with...
This project is an enterprise-grade log monitoring and anomaly detection system developed to support Data Oceans and their clients in identifying unusual patterns across large-scale transactional and system log data. The solution leverages a combination of unsupervised machine learning models (such as Isolation Forest and K-means) and statistical methods to...
In our capstone project we implemented unsupervised machine learning clustering methods to segment users of the real-estate website Monopolio to provide insights for marketing purposes. Our client, DD360, has a new website that presents real estate opportunities for rent and purchase in Mexico, where users can browse through different neighborhoods...
Critical care providers need to make high-stake clinical decisions on a daily basis. Making research-informed decisions is challenging because of the sheer amount of research results published in the field everyday with potentially conflicting conclusions regarding the same clinical questions. This project aims to develop a medical question answering model,...
Duke Health’s Anesthesiology team faced a significant challenge in analyzing vast amounts of patient-related data due to the limited technical skills of their team members. This limitation forced them to rely heavily on scarce statisticians and programmers for data cleaning and processing, a process that was both time-consuming and expensive,...
The Federal Open Market Committee (FOMC) sets the target for the federal funds rate, a critical tool influencing interest rates, inflation, and the broader economy. What if we could use large language models (LLMs) to anticipate their next decision? In partnership with Bank of New York (BNY), our project developed...
This project analyzes customer satisfaction at Citizens Bank using Net Promoter Score (NPS) survey data, focusing specifically on customers’ likelihood to recommend the bank to others. By combining survey responses with customer account data, transaction history, and banking behaviors, we identified key factors that drive customer satisfaction and loyalty. Our...
This project develops a multi-stage framework to support Duke Women’s Soccer recruitment through advanced modeling and a web-service platform. In the first-stage of modeling, detailed game-level player statistics from the Wyscout API are transformed through extensive aggregation and feature selection to form the final set of average, percentage, and versatility...
Subsalt is experiencing high computational costs from generating legally de-identified synthetic data by training multiple generative models across various configurations. Many of these configurations are not utilized, contributing to the inefficiencies. These inefficiencies lead to unnecessary expenses, especially from running costly privacy tests on different models and configurations. To tackle...
Let’s make vacation accessible for everyone–that’s the mission of Becoming rentABLE, and it becomes ours too. Our objective focused on helping Becoming rentABLE to expand their inventory of accessible short-term rental rental across 18 target U.S cities. To achieve this, we analyzed Airbnb text review data and used a RoBERTa-based...
As a strategic data initiative with AB InBev, our team was tasked with exploring the use of alternative data to enhance their Business-to-Business (B2B) product recommendation systems. Traditionally, AB InBev relied heavily on internally generated data—such as customer purchase activity—for these predictions. We were brought in to investigate whether novel,...
Citizens Bank wanted to understand the impact of different marketing channels on sales while accounting for macroeconomic factors and regional presence. To address this, our team developed a Marketing Mix Model (MMM) that categorized banking products based on similar sales trends and built separate models for each category. The approach...
Our client, Orderly, aims to develop an up-to-date healthcare provider database to help patients, healthcare insurance firms and health tech companies to find the right healthcare provider information efficiently. Currently, they have developed an algorithm capable of verifying healthcare provider information by finding relevant online data. Our role is to...
Public project statement not available.
Our project with DataOceans represents a significant advancement in solving the intricate challenges of data preprocessing for key industries by leveraging artificial intelligence, including machine learning and natural language processing. Through the implementation of an AI-driven system that features error detection and machine learning for header classification, especially employing the...
Our team collaborated with AMECO, a construction service company, to address their challenge of aligning actual profit margins with targeted goals. Based on past years sales performance, AMECO is experiencing a constant decline in profit. The target profit is achieved by selling their products at their target profit margins. Therefore,...
This project characterizes the relationship between gender and academic research funding through the examination of the allocation of National Institutes of Health (NIH) R01 grants. Despite providing detailed funding and institution-level information on Principal Investigators (PIs), the NIH does not make individual-level demographic data public, making it challenging to directly...
In our Capstone project, we provided a pipeline for our clients to evaluate and reduce bias in their financial classification models using the AI Fairness 360 package developed by IBM. Our client, 2nd Order Solutions (2OS), is a financial service firm that provides banks with machine learning algorithms to decide...
Anesthesiology providers are believed to be at least as likely as the general population to suffer from non-medical opioid use. Currently, there is a lack of data-driven methods to efficiently identify providers in need of support. This project applies machine learning models to electronic health records to detect changes in...
Our client, DataOceans distributes other companies’ billing statements that contain advertisements for new products. To make the ads more personalized, DataOceans has developed a user interface that allows businesses to define specific criteria for targeting their advertisements. For instance, a business might target married males over the age of 40,...
All animal behavior is based on approach and avoidance motivations. Organisms tend to approach things that are positively valenced or beneficial to them and avoid things that are negatively valenced or harmful to them. The automaticity of these fundamental motivations has been supported extensively with empirical research using various types...
Our project addressed the significant challenge of reducing computational and storage costs associated with the deployment of large language models (LLMs) at Proofpoint. Recognizing the vital role of LLMs in enhancing the detection and mitigation of cyber threats across emails, documents, and other communication channels, our team embarked on a...
It is widely believed that environmental factors such as lack of employment opportunities, Inadequate access to food, unstable housing, and household substance use, lead to family stressors that negatively impact children. Some of these impacts may come through direct maltreatment from family or community member. This capstone project focused on...
The US Army is responsible for recruiting for its Special Operations Command, a unit of highly specialized personnel responsible for challenging and critical missions, where its members come from various branches in the larger military (Army, Navy, and Air Force). The recruiting and hiring process is very time-consuming, and the...
Machine learning models used in production settings are frequently retrained as new data becomes available to improve accuracy. However, this causes inconsistencies between the predictions provided by different versions of the model, a process known as prediction churn. This is undesirable for end users and organizations, as it might result...
Despite the availability of diverse geographical data, annotating it for analysis is time-consuming, requires specialist knowledge of the specific application domain, and existing supervised models based on domain-specific labeling have limited applicability. This expensive and time-consuming process results in unequal access to datasets, especially in low-income countries. To overcome these...
The use of Electronic Health Records (EHR) by medical students during their clinical rotations provides a rich source of documentation of the students’ clinical experiences. However, accessing and summarizing those notes through the EHR is cumbersome and inefficient. There is a need to improve the digitization of notes and encounter...
Discrepancies between population and housing estimates and the 2020 Census counts, especially in undercounted or overcounted areas, might lead to poor policy decisions. These disparities, often correlated with various demographic attributes, might increase inequities in resource distribution. To address this issue, we conducted a comparison of current population and housing...
Interpretability is a very key aspect of 2nd Order’s solution to provide financial institutions with explanations to customers for certain financial decisions, such as credit card application denials as mandated under US law. In this project, we evaluated and compared the two most popular explainable prediction algorithms commonly used in...
Systemic Lupus Erythematosus (SLE) patients are among the vulnerable groups of people facing increased COVID-19 vaccine hesitancy tied to socio-economic factors like barriers to healthcare access and lower household income. This project set out to identify socio-economic factors contributing to vaccine hesitancy among SLE patients and model their relationship to...
Climate change, particularly rising temperatures, is destabilizing the ecological balance within the Sargasso Sea and the Costa Rica Thermal Dome which are critical areas that require marine conservation measures. In this project, we observed both short- and long-term biological changes in these high-seas locations, detecting seasonal patterns and the consequences...
Underdeveloped data-driven approaches for ranking college teams hinder the potential to generate compelling information about team performance, forecast game outcomes, and inform playoff seeding. We developed a probabilistic ranking system for college soccer teams that quantifies uncertainty in team ratings, interprets each team’s rating as the expected number of goals...
Our capstone project aims to develop an accurate and versatile topic modeling system. In addition, we develop a machine learning pipeline that can automatically assign labels to new articles. The project has both unsupervised and supervised components. In the first semester we accomplished the unsupervised part, where we leverage natural...
Our client, Spencer Stuart, is a strategic consulting firm that offers leadership and business consulting services to organizational decision makers. Our study aims to assess S&P 500 company performance based on board member team composition. The team composition for each board includes features such as board industry experience, board female...
Digital libraries are an indispensable source of information in the modern era. However, given the advances in technology and the gargantuan rate of information creation, it is becoming increasingly important to enable digital libraries with intelligent solutions that aid comprehension of the constantly evolving and growing sea of scholarly publications....
Once widely abundant throughout the North Pacific ocean, the eastern North Pacific right whale (NPRW) population (Eubalaena japonica) is now one of the rarest populations of marine mammals. Classified as critically endangered under the International Union for the Conservation of Nature, right whales have been illegally hunted for decades and...
Saving Nature partners with local NGOs and scientists to connect fragmented ecosystems. Through land purchases and the construction of nature corridors, the organization seeks to revitalize populations of endangered species that have lost access to their historical habitats and mating grounds to industrialization. To validate the effectiveness of purchased corridors,...
With the rapid development in financial technology, algorithm trading has begun to replace human decisions. One such type of algorithm is reinforcement learning which takes trading actions by modeling historical data. In this work, we build a modular reinforcement learning system with the support of additional integrated predictive models to...
Our aim with this project was to estimate the heights of power plant stacks from publicly available satellite imagery data. Our client, WattTime, is working to build the first automated system for monitoring the emission from power plants globally, utilizing the height of flue-gas stacks to better understand the dispersal...
Investment banking and asset management use regular market summary updates to manage risk and exposure on all equity products across markets. These Market Risk data are variable in quality based on the user who develops them. This project will aim to derive key statistical insights from Market Risk data using...
In today’s world, datasets are much larger and more complex and thus require new and innovative techniques to make reliable inferences. The data also reflect existing human biases and social inequalities, which are compounded by the use of algorithms and machine learning models. This can lead to unfair outcomes and...
Wild animals eaten by humans are known as “wild meat”, or “bushmeat” in sub-Saharan Africa. Hunting for bushmeat is both an ancient and modern practice, but as we step into the Anthropocene, bushmeat hunting has become unsustainable in many areas, threatening biodiversity and the food, financial, and cultural security of...
Showing 1-50 of 67 results