Subsalt: The Query-Regulated Engine

MIDS logo
: Subsalt
: 2025

Subsalt is experiencing high computational costs from generating legally de-identified synthetic data by training multiple generative models across various configurations. Many of these configurations are not utilized, contributing to the inefficiencies. These inefficiencies lead to unnecessary expenses, especially from running costly privacy tests on different models and configurations.

To tackle this, we’re focusing on two key areas: first, developing predictive tools to score model configurations upfront to avoid training redundant models – an approach Subsalt has preliminarily identified as promising; second, optimizing the existing synthetic data workflow and Cloud infrastructure (e.g., Azure Kubernetes) to improve efficiency. or instance, instead of completing all privacy tests regardless of interim results, we aim to terminate parallel testing as soon as any single test fails. Since privacy testing is twice as costly as model training, this simple adjustment could potentially yield substantial savings. Our next steps involve analyzing where the biggest inefficiencies lie, enabling us to prioritize the most impactful solution for immediate cost reductions.

Mentor: Vijay Keswani