The study also revealed that these AI models, despite being trained to predict diseases, are also learning to predict other things like gender and age, which may not be desirable. The researchers attempted to reduce the fairness gaps by optimizing "subgroup robustness" and using "group adversarial" approaches to remove demographic information from the images. However, these approaches only worked when the models were tested on data from the same types of patients they were trained on. The researchers suggest that hospitals should evaluate these AI models on their own patient population before using them.
Key takeaways:
- AI models used in medical diagnoses have been found to perform worse in women and people of color, and often use demographic shortcuts leading to incorrect results for these groups.
- Researchers have found that the AI models most accurate at making demographic predictions also show the biggest fairness gaps in their ability to accurately diagnose images of people of different races or genders.
- While it is possible to retrain these models to improve their fairness, this debiasing only works best when the models are tested on the same types of patients on whom they were trained.
- Researchers suggest that hospitals using these AI models should evaluate them on their own patient population before use, to ensure they aren't giving inaccurate results for certain groups.