Data Sharing

Managing the sharing of data for the AI-READI project

Data Standardization

We aim to share the AI-READI dataset such that it is easily reusable by anyone, especially for developing artificial intelligence (AI) and machine learning (ML) models. To achieve that, we are formatting our data according to existing standards or establishing new ones when there is a gap. Specifically, we are developing the Clinical Dataset Structure (CDS), a standard for organizing multimodal clinical research data and metadata consistently, and following it to organize the AI-READI dataset.

Additionally, each data modality is formatted according to existing standard formats such as the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) for clinical data and the Digital Imaging and Communications in Medicine (DICOM) format for retinal imaging data. We are also working with the communities managing these standards to extend them when needed for our data.

New Data Usage License

We have established a new data usage license for the AI-READI dataset. The license terms were specifically tailored to enable reuse of the AI-READI dataset (and other clinical datasets) for commercial or research purposes while putting strong requirements around data usage, security, and secondary sharing to protect study participants, especially when data is reused for AI/ML related applications. The license will continue to evolve as we receive feedback from the community based on the release of our pilot dataset (see below).


Data Use Agreement

At each release of the AI-READI dataset, two sets will be made available: a public access and a controlled access set. The public set will be stripped of Protected Health Information (PHI) as well as information related to the sex and race/ethnicity of the participants. It will be accessible after completing a few steps through our data portal (see below) such as logging in, providing the research purpose, and attesting to the proper usage of the data. The controlled set of the dataset will also be accessible through our data portal but will require additional steps for accessing. We will develop specific guidelines for accessing and using the controlled set of the dataset. We will draw on best practices from other large data programs that have controlled access data policies such as the All of Us Research Program.


Data Access Portal

We are developing a novel platform called FAIRhub for managing and sharing the data. It will contain two components: a study management platform ( for managing data and a data portal ( where data will be accessible to others after it is shared from the study management platform. The data portal is designed to make the data Findable and Accessible. The public set of the AI-READI pilot dataset is already available through FAIRhub at