Address Verification

MIDS logo
: Health
: 2024

Our client, Orderly, aims to develop an up-to-date healthcare provider database to help patients, healthcare insurance firms and health tech companies to find the right healthcare provider information efficiently. Currently, they have developed an algorithm capable of verifying healthcare provider information by finding relevant online data. Our role is to evaluate the reliability of their verification algorithm. More specifically, the Orderly’s algorithm will find a website associated with the healthcare provider information that needs to be verified, such as address. Currently, Orderly evaluates the information found by the algorithm through manual-calling verification, which is cost-ineffective. Therefore, our work aims to identify other useful information on the website and use these information as features to train a machine learning model to predict if Oderly’s algorithm finds a reliable source of information.

Our team identified a set of features such as copyright, and the presence of contact information and trained our machine learning models with these features to evaluate Orderly’s algorithm’s verification results. We experimented with four different types of ML models: Logistic Regression, Random Forest, Support Vector Machine, and Multi-Layer Perceptron. Among these models, Random Forest emerged as the top performer, achieving 75% accuracy in predicting the trustworthiness of the algorithm’s output, surpassing the baseline accuracy of 70%. We have pinpointed several generated features crucial to our model’s evaluation, such as number of external links within the website, which could be integrated into Orderly’s algorithm to enhance its verification process. For future usage, our model can offer confidence scores for the algorithm’s verification results, enabling Orderly to prioritize verification tasks and saving time and costs previously spent on manual-calling verification.