Data used to build algorithms detecting skin disease is too white
Public skin image datasets that are used to train algorithms to detect skin problems don’t include enough information about skin tone, according to a new analysis. And within the datasets where skin tone information is available, only a very small number of images are of darker skin — so algorithms built using these datasets might not be as accurate for people who aren’t white.
The study, published today in The Lancet Digital Health, examined 21 freely accessible datasets of images of skin conditions. Combined, they contained over 100,000 images. Just over 1,400 of those images had information attached about the ethnicity of the patient, and only 2,236 had information about skin color. This lack of data limits researchers’ ability to spot biases in algorithms trained on the images. And such algorithms could very well be biased: Of the images with skin tone information, only 11 were from patients with the darkest two categories on the Fitzpatrick scale, which classifies skin color. There were no images from patients with an African, Afro-Caribbean, or South Asian background.
The conclusions are similar to those from a study published in September, which also found that most datasets used for training dermatology algorithms don’t have information about ethnicity or skin tone. That study examined the data behind 70 studies that developed or tested algorithms and found that only seven described the skin types in the images used.
“What we see from the small number of papers that do report out skin tone distributions, is that those do show an underrepresentation of darker skin tones,” says Roxana Daneshjou, a clinical scholar in dermatology at Stanford University and author on the September paper. Her paper analyzed many of the same datasets as the new Lancet research and came to similar conclusions.
When images in a dataset are publicly available, researchers can go through and review what skin tones appear to be present. But that can be difficult, because photos may not exactly match what the skin tone looks like in real life. “The most ideal situation is that skin tone is noted at the time of the clinical visit,” Daneshjou says. Then, the image of that patient’s skin problem could be labeled before it goes into a database.
Without labels on images, researchers can’t check algorithms to see if they’re built using datasets with enough examples of people with different skin types.
It’s important to scrutinize these image sets because they’re often used to build algorithms that help doctors diagnose patients with skin conditions, some of which — like skin cancers — are more dangerous if they’re not caught early. If the algorithms have only been trained or tested on light skin, they won’t be as accurate for everyone else. “Research has shown that programs trained on images taken from people with lighter skin types only might not be as accurate for people with darker skin, and vice versa,” says David Wen, a co-author on the new paper and a researcher at the University of Oxford.
New images can always be added to public datasets, and researchers want to see more examples of conditions on darker skin. And improving the transparency and clarity of the datasets will help researchers track progress toward more diverse image sets that could lead to more equitable AI tools. “I would like to see more open data and more well-labeled data,” Daneshjou says.
Public skin image datasets that are used to train algorithms to detect skin problems don’t include enough information about skin tone, according to a new analysis. And within the datasets where skin tone information is available, only a very small number of images are of darker skin — so algorithms…
Recent Posts
- 30TB hard drives will finally become mainstream next year — Japanese rival to Seagate and Western Digital reveals plans to launch two 30TB+ HDDs in 2025 using two different technologies
- Quordle today – hints and answers for Sunday, May 19 (game #846)
- Blue Origin’s first crewed launch since 2022: Where to watch
- This modder proves everything’s better with a GBA SP screen attached
- Mobile industry is quietly preparing for the biggest change to your smartphone in a decade — iSIM will hasten the end of SIM cards and allow networks to preload plans on devices
Archives
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- December 2011