Stanford University

Introduction

Access to dermatological care remains a significant concern for patients across the world, and artificial intelligence (AI) tools have shown promise in assisting with triage, non-specialist diagnosis, and augmentation of specialist workflow, with several models approved or on the way to approval in the United States and European Union. Concern exists regarding the representation of patients with darker skin tones within publicly-available datasets, and studies have suggested that some computer vision-based diagnostic AI models may perform worse on photos of skin of color.



Motivation

Our group previously released the Diverse Dermatology Images (DDI) dataset, a real-world retrospective collection of Fitzpatrick skin type (FST) I-VI images, and demonstrated that disparities in performance on darker skin tones were reduced following fine-tuning on DDI images. However, the DDI dataset did not include photos from Asian patients. A 2021 review of 70 studies on AI in dermatology found a gap in the availability of public datasets representing Asian skin: of the eleven studies trained on datasets that included Asian patients, most were private data collections with no FST breakdown (7/11, 64%). The remaining four studies, which also did not include FST, had limited availability, did not provide a more detailed breakdown of ethnicity beyond “Asian”, and were mostly from South Korea. Heterogeneity in health outcomes within the Asian demographic has been recognized, especially within the Asian American community. Thus, we present the DDI-2 dataset, the first publicly available, histopathologically-confirmed, real-world dataset of photos from self-identified Asian patients residing in the U.S., along with a detailed breakdown of FST and sub-ethnicity whenever available.



Clinical application: Patients of color disproportionately face access barriers to dermatology and would benefit from proposed AI use cases for triage and clinical decision support. Furthermore, some clinicians may have less experience evaluating conditions presenting in skin of color, a predisposition that can be exacerbated by AI tools not sufficiently trained on skin of color. Thus, wider availability of well-labeled, realistic photos across diverse skin tones is crucial to mitigate potential inequity in algorithm development and can provide a tool for benchmarking performance across different populations.


Dataset

DDI-2 comprises a retrospective convenience sample of 665 photos (mostly lesions [74.7%] rather than rashes [23.3%]) corresponding to 169 unique diagnoses from 550 patients who were seen at an academic medical center dermatology department from 2012-2022, and contains granular annotation of Fitzpatrick skin type, anatomical site, histopathologically-confirmed diagnoses, and annotation of artifacts associated with routine clinical workflows such as the presence of surgical markings or ruler. Each field was manually confirmed by a senior dermatology fellow with four years of dermatology training and a board-certified dermatologist with seven years of experience following board certification. Further details can be found in our paper.


Further description of the dataset is available in our datasheet.


Accessing Dataset


Stanford University School of Medicine Diverse Dermatology Images (DDI)-2 Dataset Research Use Agreement

By registering for downloads from the DDI-2 Dataset, you are agreeing to this Research Use Agreement, as well as to the Terms of Use of the Stanford University School of Medicine website as posted and updated periodically at http://www.stanford.edu/site/terms/.

1. Permission is granted to view and use the DDI-2 Dataset without charge for personal, non-commercial research purposes only. Any commercial use, sale, or other monetization is prohibited.

2. Other than the rights granted herein, the Stanford University School of Medicine (“School of Medicine”) retains all rights, title, and interest in the DDI-2 Dataset.

3. You may make a verbatim copy of the DDI-2 Dataset for personal, non-commercial research use as permitted in this Research Use Agreement. If another user within your organization wishes to use the DDI-2 Dataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.

4. YOU MAY NOT DISTRIBUTE, PUBLISH, OR REPRODUCE A COPY of any portion or all of the DDI-2 Dataset to others without specific prior written permission from the School of Medicine.

5. YOU MAY NOT SHARE THE DOWNLOAD LINK to the DDI-2 dataset to others. If another user within your organization wishes to use the DDI-2 Dataset, they must register as an individual user and comply with all the terms of this Research Use Agreement.

6. You must not modify, reverse engineer, decompile, or create derivative works from the DDI-2 Dataset. You must not remove or alter any copyright or other proprietary notices in the DDI-2 Dataset.

7. The DDI-2 Dataset has not been reviewed or approved by the Food and Drug Administration, and is for non-clinical, Research Use Only. In no event shall data or images generated through the use of the DDI-2 Dataset be used or relied upon in the diagnosis or provision of patient care.

8. THE DDI-2 DATASET IS PROVIDED "AS IS," AND STANFORD UNIVERSITY AND ITS COLLABORATORS DO NOT MAKE ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, NOR DO THEY ASSUME ANY LIABILITY OR RESPONSIBILITY FOR THE USE OF THIS DDI-2 DATASET.

9. You will not make any attempt to re-identify any of the individual data subjects. Re-identification of individuals is strictly prohibited. Any re-identification of any individual data subject shall be immediately reported to the School of Medicine.

10. Any violation of this Research Use Agreement or other impermissible use shall be grounds for immediate termination of use of this DDI-2 Dataset. In the event that the School of Medicine determines that the recipient has violated this Research Use Agreement or other impermissible use has been made, the School of Medicine may direct that the undersigned data recipient immediately return all copies of the DDI-2 Dataset and retain no copies thereof even if you did not cause the violation or impermissible use.

In consideration for your agreement to the terms and conditions contained here, Stanford grants you permission to view and use the DDI-2 Dataset for personal, non-commercial research. You may not otherwise copy, reproduce, retransmit, distribute, publish, commercially exploit or otherwise transfer any material.

Limitation of Use

You may use the DDI-2 Dataset for legal purposes only.

You agree to indemnify and hold Stanford harmless from any claims, losses or damages, including legal fees, arising out of or resulting from your use of the DDI-2 Dataset or your violation or role in violation of these Terms. You agree to fully cooperate in Stanford’s defense against any such claims. These Terms shall be governed by and interpreted in accordance with the laws of California.

Access the dataset via the Stanford Artificial Intelligence in Medicine and Imaging (AIMI) Center Shared Datasets Portal.


Paper

DDI-2: A Diverse Skin Condition Image Dataset Representing Self-Identified Asian Patients.
Crystal T Chang, Pirunthan Pathmarajah, Johan Allerup, Sheharbano Jafry, Kiana Yekrang, Dom Mitchell, Niki Ai See, Lila A Perrone, Bradley Fong, Sanjana Rajasekaran, Miah D Cisneros, Roxana Daneshjou, Justin Ko, Albert S Chiou. J Invest Dermatol (2024)

manuscript supplementary material

datasheet for the dataset

For inquiries, contact us at roxanad@stanford.edu.