All Capstone Projects
Our client, Orderly, aims to develop an up-to-date healthcare provider database to help patients, healthcare insurance firms and health tech companies to find the right healthcare provider information efficiently. Currently, they have developed an algorithm capable of verifying healthcare provider information by finding relevant online data. Our role is to...
Public project statement not available.
Our project with DataOceans represents a significant advancement in solving the intricate challenges of data preprocessing for key industries by leveraging artificial intelligence, including machine learning and natural language processing. Through the implementation of an AI-driven system that features error detection and machine learning for header classification, especially employing the...
Our team collaborated with AMECO, a construction service company, to address their challenge of aligning actual profit margins with targeted goals. Based on past years sales performance, AMECO is experiencing a constant decline in profit. The target profit is achieved by selling their products at their target profit margins. Therefore,...
This project characterizes the relationship between gender and academic research funding through the examination of the allocation of National Institutes of Health (NIH) R01 grants. Despite providing detailed funding and institution-level information on Principal Investigators (PIs), the NIH does not make individual-level demographic data public, making it challenging to directly...
In our Capstone project, we provided a pipeline for our clients to evaluate and reduce bias in their financial classification models using the AI Fairness 360 package developed by IBM. Our client, 2nd Order Solutions (2OS), is a financial service firm that provides banks with machine learning algorithms to decide...
Anesthesiology providers are believed to be at least as likely as the general population to suffer from non-medical opioid use. Currently, there is a lack of data-driven methods to efficiently identify providers in need of support. This project applies machine learning models to electronic health records to detect changes in...
Our client, DataOceans distributes other companies’ billing statements that contain advertisements for new products. To make the ads more personalized, DataOceans has developed a user interface that allows businesses to define specific criteria for targeting their advertisements. For instance, a business might target married males over the age of 40,...
All animal behavior is based on approach and avoidance motivations. Organisms tend to approach things that are positively valenced or beneficial to them and avoid things that are negatively valenced or harmful to them. The automaticity of these fundamental motivations has been supported extensively with empirical research using various types...
Our project addressed the significant challenge of reducing computational and storage costs associated with the deployment of large language models (LLMs) at Proofpoint. Recognizing the vital role of LLMs in enhancing the detection and mitigation of cyber threats across emails, documents, and other communication channels, our team embarked on a...
It is widely believed that environmental factors such as lack of employment opportunities, Inadequate access to food, unstable housing, and household substance use, lead to family stressors that negatively impact children. Some of these impacts may come through direct maltreatment from family or community member. This capstone project focused on...
The US Army is responsible for recruiting for its Special Operations Command, a unit of highly specialized personnel responsible for challenging and critical missions, where its members come from various branches in the larger military (Army, Navy, and Air Force). The recruiting and hiring process is very time-consuming, and the...
Our capstone project aims to develop an accurate and versatile topic modeling system. In addition, we develop a machine learning pipeline that can automatically assign labels to new articles. The project has both unsupervised and supervised components. In the first semester we accomplished the unsupervised part, where we leverage natural...
Our client, Spencer Stuart, is a strategic consulting firm that offers leadership and business consulting services to organizational decision makers. Our study aims to assess S&P 500 company performance based on board member team composition. The team composition for each board includes features such as board industry experience, board female...
Digital libraries are an indispensable source of information in the modern era. However, given the advances in technology and the gargantuan rate of information creation, it is becoming increasingly important to enable digital libraries with intelligent solutions that aid comprehension of the constantly evolving and growing sea of scholarly publications....
Once widely abundant throughout the North Pacific ocean, the eastern North Pacific right whale (NPRW) population (Eubalaena japonica) is now one of the rarest populations of marine mammals. Classified as critically endangered under the International Union for the Conservation of Nature, right whales have been illegally hunted for decades and...
Saving Nature partners with local NGOs and scientists to connect fragmented ecosystems. Through land purchases and the construction of nature corridors, the organization seeks to revitalize populations of endangered species that have lost access to their historical habitats and mating grounds to industrialization. To validate the effectiveness of purchased corridors,...
With the rapid development in financial technology, algorithm trading has begun to replace human decisions. One such type of algorithm is reinforcement learning which takes trading actions by modeling historical data. In this work, we build a modular reinforcement learning system with the support of additional integrated predictive models to...
Our aim with this project was to estimate the heights of power plant stacks from publicly available satellite imagery data. Our client, WattTime, is working to build the first automated system for monitoring the emission from power plants globally, utilizing the height of flue-gas stacks to better understand the dispersal...
Investment banking and asset management use regular market summary updates to manage risk and exposure on all equity products across markets. These Market Risk data are variable in quality based on the user who develops them. This project will aim to derive key statistical insights from Market Risk data using...
In today’s world, datasets are much larger and more complex and thus require new and innovative techniques to make reliable inferences. The data also reflect existing human biases and social inequalities, which are compounded by the use of algorithms and machine learning models. This can lead to unfair outcomes and...
Wild animals eaten by humans are known as “wild meat”, or “bushmeat” in sub-Saharan Africa. Hunting for bushmeat is both an ancient and modern practice, but as we step into the Anthropocene, bushmeat hunting has become unsustainable in many areas, threatening biodiversity and the food, financial, and cultural security of...
The goal of our team was to establish an automated ETL (Extract, Transform, Load) Pipeline that extracts raw wearable data (from Fitbit and Garmin APIs), transforms the data, and loads it to the PostgreSQL database (Covidentify Analysis Database) in Microsoft Azure. This repository is a mirror of the code that...
This project examines a number of questions using raw data on course enrollment at Duke University. Questions range from how do grades impact the order of courses students take to are there certain courses that are bottlenecks for majors. A tangible product will be a recommendation engine for course suggestion...
To meet the energy needs of those without access, we need to know where existing infrastructure, especially transmission and distribution lines, are in relation to communities in need. Current databases track approximately 85% of global energy infrastructure capacity. The remaining 15% may dramatically impact global emissions, but are particularly hard...
This project will utilize publicly available LiDAR data to estimate carbon stocks across The Conservation Fund’s nationwide portfolio of working forestlands. Students will create computational tools for processing NASA’s newest LiDAR data – Global Ecosystem Dynamics Investigation (GEDI) and utilize field data to validate and interpret data. In addition, students...
The goal of this project is to develop interventions for the growing opioid crisis. To do this, the team will build a method to probabilistically fuse granular synthetic household data with publicly available data related to opioid use to predict where opioid hotspots are likely to occur, and why. The...
Every election cycle, the use of digital advertising by U.S. political campaigns becomes more commonplace. The past few years have seen extensive public debate about the use of targeted digital advertising by political campaigns. Unlike other forms of political advertising, campaign ads in the digital sphere remain largely unregulated by...
The “small-watershed” ecology approach measures all precipitation chemistry (inputs) and stream solute fluxes (outputs) within a watershed. This has led to key environmental insights such as the discovery of acid rain. Currently, upwards of 150 different federally funded sites have implemented this watershed approach to evaluate site-specific questions about climate...
The goal of this project is to find a better way of presenting data from an exercise tolerance test to enable the clinician to make more precise diagnoses and better understanding results of the test. The students will create a model for this test, showing the interaction between the three...
This project will focus on a power service restoration problem in the design of the smart grids. Due to increasingly severe weather events and cyber-physical security threats, a more resilient and reliable power system is needed to ensure the continuous operation and availability of power applications and services. Traditionally, a...
This project aims to explore two topics within a world-leading surgery program. For the first project, the final goal will be to productionize a system within the existing technology stack to automate surgery scheduling. The strategy to implement this system will be to use electronic health records data to build...
Malicious webpages will often display brand logos to feign legitimacy; however, these brand logos are often distorted from official versions. The goal of this project is to build a classifier that recognizes and identifies logos in a screenshot of a legitimate or malicious webpage. The classifier must also be able...
Wild animals eaten by humans are known as “wild meat”, or “bushmeat” in sub-Saharan Africa. Hunting for bushmeat is both an ancient and modern practice, but as we step into the Anthropocene, bushmeat hunting has become unsustainable in many areas, threatening biodiversity and the food, financial, and cultural security of...
The goal of capstone project is to apply and extend custom analytics solutions to discover how life remains resilient in extreme environments. An explosion of data has resulted from recent discoveries of tiny single-celled life hiding out in the most extreme places on Earth. These single-celled creatures, or microbes, thrive...
Showing 1-50 of 54 results