MLCommons launches a new platform to benchmark AI medical models

The healthcare industry is increasingly adopting AI, with 80% of organizations having an AI strategy in place, according to a 2020 survey by Optum. However, it is challenging to evaluate the performance of medical AI models, especially those trained with data from limited clinical settings, which can lead to biases. To address this, MLCommons has developed MedPerf, a testing platform that can evaluate AI models on diverse real-world medical data while protecting patient privacy. The platform was built with input from both industry and academia, and is designed to be used by healthcare organizations rather than vendors.

MedPerf has already been used in a test of the NIH-funded Federated Tumor Segmentation Challenge, where it supported the testing of 41 different models across 32 healthcare sites on six continents. The results revealed biases in the models, which showed reduced performance at sites with different patient demographics than the ones they were trained on. However, the author of the article questions whether MedPerf can truly address the complex issues in AI for healthcare, noting that safely deploying medical models requires ongoing, thorough auditing on the part of vendors and their customers.

Key takeaways:

The healthcare industry is increasingly adopting AI, with 80% of organizations having an AI strategy in place, according to a 2020 survey by Optum.
MLCommons has developed a new testing platform called MedPerf to benchmark and evaluate medical models, aiming to improve effectiveness, reduce bias, and build public trust.
MedPerf is designed to be used by healthcare organizations to assess AI models on demand, supporting popular machine learning libraries as well as private models and models available only through an API.
Despite the potential of MedPerf, the article suggests that it may not fully address the complex issues in AI for healthcare, such as integrating the technology into the daily routines of healthcare practitioners and technical systems.

MLCommons launches a new platform to benchmark AI medical models

Key takeaways:

Comments (0)

Newsletter