Title: Upload data to a Flywheel project
Date: April 28th 2020
Description:
This notebook shows how to upload data to a new project using the Flywheel SDK.

Topics that will be covered:

Project, subjects, sessions, and acquisitions creation.
Upload of file(s) to an acquisition container.
Simple Metadata Editing.

Requirements¶

Access to a Flywheel instance.
Read/Write permission to at least one Flywheel Group.

Install and Import Dependencies¶

In [ ]:

# Install specific packages required for this notebook
!pip install flywheel-sdk pydicom

In [ ]:

# Import packages
from getpass import getpass
import logging
import os
from pathlib import Path
import re
import time

from IPython.display import display, Image
import flywheel
from permission import check_user_permission

In [ ]:

# Instantiate a logger
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
log = logging.getLogger('root')

Download some test data¶

In this notebook we will be uploading images to a Flywheel Instance.
To get started, your first need to download the test dataset that will be used in this notebook.

On mybinder.org or any Mac/Linux system, the following commands will download a zip archive and unzip the data into a folder called data-upload-notebook in your current directory:

In [ ]:

!curl -L -o data.zip "https://drive.google.com/uc?export=download&id=1aDgZhm94-N0x2WKAIxr2QpwD4M20va0W"
!unzip -qf data.zip -d data-upload-notebook

If the previous commands return an errors, download the file directly using the link provided to the curl command above and extract the archive in the current working directory to a folder named data-upload-notebook

The file tree of data-upload-notebook should like this:

data-uplodate-notebook
├── anx_s1
│   └── anx_s1_anx_ses1_protA
│       └── T1_high-res_inplane_Ret_knk_0
│           └── 6879_3_1_t1.dcm.zip
├── anx_s2
│   └── anx_s2_anx_ses1_protA
│       └── T1\ high-res\ inplane\ FSPGR\ BRAVO_0
│           └── 4784_3_1_t1.dcm.zip
├── anx_s3
│   └── anx_s3_anx_ses1_protA
│       └── T1_high-res_inplane_Ret_knk_0
│           └── 6879_3_1_t1.dcm.zip
├── anx_s4
│   └── anx_s4_anx_ses2_protB
│       └── T1_high-res_inplane_Ret_knk_1
│           └── 8403_4_1_t1.dcm.zip
├── anx_s5
│   └── anx_s5_anx_ses1_protA
│       └── T1_high-res_inplane_Ret_knk_1
│           └── 8403_4_1_t1.dcm.zip
└── participants.csv

Flywheel API Key and Client¶

Get your API_KEY. More on this at in the Flywheel SDK doc here.

In [ ]:

API_KEY = getpass('Enter API_KEY here: ')

Instantiate the Flywheel API client

In [ ]:

fw = flywheel.Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))

Show Flywheel logging information

In [ ]:

log.info('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])

Understand the Flywheel Hierarchy¶

Flywheel data model relies on hierarchical containers. You can read more about the flywheel containers in our documentation here.
In flywheel project are structure into the following hierarchy:

Group
└── Project
    └── Subject 
        └── Session
            └── Acquisition

Each of Project, Subject, Session and Acquisition are containers. Containers shared common properties such as the ability to store files, metadata or analyses.

In this notebook we will be:

Creating the Project to host our data.
Creating the hierarchy of Subject/Session/Acquisition matching our data input.
Uploading the DICOM archive to each Acquisition.
Showing how to update metadata of a container.

Initialize a few values¶

In this notebook, we will be uploading data to a Project. The label of the Project will be defined by the PROJECT_LABEL variable defined below. Here we set it up to be AnxietyStudy01 but feel free to change it to something that makes more sense to you.

In [ ]:

PROJECT_LABEL = 'AnxietyStudy01'

In Flywheel each project belongs to a Group. The label of the Group that will be used to create the Project is defined by the GROUP_LABEL variable below.

To be able to create a Project in a Group, you must at least have read/write permission for this Group. If you don't have read/write permission on any Group please contact you site admin.

Specify the Group you have r/w permission on and where the Project will be created:

In [ ]:

GROUP_LABEL = '<your-group-label>'

We also define a varibale that pointed to the root directory where the data got downloaded. If you have followed the steps above to download your data, you should have all the data in a folder called data-upload-notebook. If that's not the case, edit the below variable accordingly.

In [ ]:

PATH_TO_DATA = Path('data-upload-notebook')

Requirements¶

Before starting off, we want to check your permission on the Flywheel Instance in order to proceed in this notebook.

In [ ]:

min_reqs = {
"site": "user",
"group": "admin"
}

In [ ]:

GROUP_ID = input('Please enter the Group ID that you will use in this notebook: ')

check_user_permission will return True if both the group meet the minimum requirement, else a compatible list will be printed.

In [ ]:

check_user_permission(fw, min_reqs, group=GROUP_ID)

Add a New Project¶

In this section, we will be creating a new project with label PROJECT_LABEL in the Group's GROUP_LABEL.

First, we will be getting the Group using the lookup method.

In [ ]:

my_group = fw.lookup(GROUP_LABEL)

Before creating a new project, it is a good practice to check if the Project you are trying to create exists in the Flywheel instance or not. We can do this by checking if a Project with label=PROJECT_LABEL exists in the Group you have specified:

In [ ]:

project = my_group.projects.find_first(f'label={PROJECT_LABEL}')
if project:
    log.info(f'Project {GROUP_LABEL}/{PROJECT_LABEL} already exists. Please update your PROJECT_LABEL variable.')
else:
    log.info(f'Project {GROUP_LABEL}/{PROJECT_LABEL} does not exist. Looking all good.')

If the Project does not exist, it will return False and we can create it. If a Project was found, it will return the Project and in that case, either update the PROJECT_LABEL to something different to create a new project OR make sure that the data that you are about to upload will not interfere with the data already present in the Project

In [ ]:

if not project:
    project = my_group.add_project(label=PROJECT_LABEL)
else:
    raise ValueError(f'Project {PROJECT_LABEL} already exists in group {GROUP_LABEL}, please pick another project label.')

Modify Project Gear Rules¶

After a new Project is being created, we will be disabling the Gear Rules for demo purposes.

First, we use get_project_rules to get a list of all rules for a project.

In [ ]:

gear_rules = fw.get_project_rules(project.id)

If the Gear Rules does not exist, gear_rules will return False. If there is Gear Rules setup in the project, it will return True, and disable the Gear Rule if disabled == False.

In [ ]:

if gear_rules:
    for rule in gear_rules:
        if rule.disabled == False:
            rule_obj = {'disabled': True}
            fw.modify_project_rule(project.id, rule.id, rule_obj)

Create Subjects, Sessions and Acquisitions and upload files¶

Now that we have a Project, we can create all the containers that are required to host our dataset.

What's the plan?¶

Following the Flywheel Hierarchy, we will loop through each subject folders and either get the Subject if it exists in the Project already or create it not ( we will use the get_or_create_subject function below for this). We will do the same to create the Session and Acquisition containers. Once we get down to the Acqusition container, we will upload the corresponding DICOM archive to it (we will use the upload_file_to_acquistion function below for this)

In [ ]:

def get_or_create_subject(project, label, update=True, **kwargs):
    """Get the Subject container if it exists, else create a new Subject container.
    
    Args:
        project (flywheel.Project): A Flywheel Project.
        label (str): The subject label.
        update (bool): If true, update container with key/value passed as kwargs.
        kwargs (dict): Any key/value properties of subject you would like to update.

    Returns:
        (flywheel.Subject): A Flywheel Subject container.
    """
    
    if not label:
        raise ValueError(f'label is required (currently {label})')
        
    subject = project.subjects.find_first(f'label={label}')
    if not subject:
        subject = project.add_subject(label=label)
        
    if update and kwargs:
        subject.update(**kwargs)

    if subject:
        subject = subject.reload()

    return subject

In [ ]:

def get_or_create_session(subject, label, update=True, **kwargs):
    """Get the Session container if it exists, else create a new Session container.
    
    Args:
        subject (flywheel.Subject): A Flywheel Subject.
        label (str): The session label.
        update (bool): If true, update container with key/value passed as kwargs.        
        kwargs (dict): Any key/value properties of Session you would like to update.

    Returns:
        (flywheel.Session): A flywheel Session container.
    """
    
    if not label:
        raise ValueError(f'label is required (currently {label})')
        
    session = subject.sessions.find_first(f'label={label}')
    if not session:
        session = subject.add_session(label=label)
        
    if update and kwargs:
        session.update(**kwargs)

    if session:
        session = session.reload()

    return session

In [ ]:

def get_or_create_acquisition(session, label, update=True, **kwargs):
    """Get the Acquisition container if it exists, else create a new Acquisition container.
    
    Args:
        session (flywheel.Session): A Flywheel Session.
        label (str): The Acquisition label.
        update (bool): If true, update container with key/value passed as kwargs.        
        kwargs (dict): Any key/value properties of Acquisition you would like to update.

    Returns:
        (flywheel.Acquisition): A Flywheel Acquisition container.
    """
    
    if not label:
        raise ValueError(f'label is required (currently {label})')
        
    acquisition = session.acquisitions.find_first(f'label={label}')
    if not acquisition:
        acquisition = session.add_acquisition(label=label)
        
    if update and kwargs:
        acquisition.update(**kwargs)

    if acquisition:
        acquisition = acquisition.reload()

    return acquisition

In [ ]:

def upload_file_to_acquistion(acquistion, fp, update=True, **kwargs):
    """Upload file to Acquisition container and update info if `update=True`
    
    Args:
        acquisition (flywheel.Acquisition): A Flywheel Acquisition
        fp (Path-like): Path to file to upload
        update (bool): If true, update container with key/value passed as kwargs.        
        kwargs (dict): Any key/value properties of Acquisition you would like to update.        
    """
    basename = os.path.basename(fp)
    if not os.path.isfile(fp):
        raise ValueError(f'{fp} is not file.')
        
    if acquistion.get_file(basename):
        log.info(f'File {basename} already exists in container. Skipping.')
        return
    else:
        log.info(f'Uploading {fp} to acquisition {acquistion.id}')
        acquistion.upload_file(fp)
        while not acquistion.get_file(basename):   # to make sure the file is available before performing an update
            acquistion = acquistion.reload()
            time.sleep(1)
            
    if update and kwargs:
        f = acquisition.get_file(basename)
        f.update(**kwargs)

The files we want to upload are DICOM zip archive. Let's get a list of all of them:

In [ ]:

files_to_upload = list(PATH_TO_DATA.rglob('*.dcm.zip'))
dl = '\n'
print(f'Files to upload: \n{dl.join(map(str, files_to_upload))}')

In this notebook we will parse the Subject, Session and Acquisition labels from the folders and subfolder path directly. If we wanted to do more, we could use regular expression on the path (e.g. something like r'^data-upload-notebook/(?P<sub_label>[\w\d]+)/.+(?P<ses_label>ses[\d\w\_]+)/(?P<acq_label>.+)')

Tip: Use Regex101, an online regex tester and debugger, to write and test on example inputs before putting it in your code .

Getting the work done¶

We are now ready to walk our folders, create the containers accordingly and upload the DICOM zip archive to the Acquisition container.

In [ ]:

log.info('Starting upload...')
for subj in PATH_TO_DATA.glob('anx*'):
    log.info('Processing subject %s', str(subj))
    subject = get_or_create_subject(project, subj.name, update=True, type='human', sex='female')  # passing some value for the sake of the example
#     print(f'{subject.sex}')
    for ses in subj.glob('anx*'):
        log.info('Processing session %s', str(ses))
        session = get_or_create_session(subject, ses.name)
        for acq in ses.glob('T1*'):            
            log.info('Processing acquisition %s', str(acq))            
            acquisition = get_or_create_acquisition(session, acq.name)
            for file in acq.glob('*.dcm.zip'):
                upload_file_to_acquistion(acquisition, file)
log.info('DONE')

Once the upload is done, you should have all your data available in your Flywheel Project, which should look like this:

Update Subject Metadata¶

For sake of example, let's demonstarate how we can update the metadata for Subject anx_s1.

Let's first find that specific Subject:

In [ ]:

anx_s1 = project.subjects.find_first('label=anx_s1').reload()

Tip: Using reload() is nessecary to load the entire container.

We are going to update the firstname, lastname and the sex of this Subject. Let's check what we have currently:

In [ ]:

print(f'Subject anx_s1 sex is: {anx_s1.sex}, first name is: {anx_s1.firstname}, last name is: {anx_s1.lastname}')

We can update it with the update method of the container:

In [ ]:

anx_s1.update(
            firstname='John',
            lastname='Doe',
            sex='male',
)

Let's reload the subject from the database to make sure the update went through:

In [ ]:

anx_s1 = project.subjects.find_first('label=anx_s1').reload()
print(f'Subject anx_s1 sex is: {anx_s1.sex}, first name is: {anx_s1.firstname}, last name is: {anx_s1.lastname}')

Each container also contains a field called info which can be used to stored unstructured information in a dictionary.

In [ ]:

complicated_nested_dict = {'a_complicated_nested_dict': {'key1': [1, 2, 3, 4], 
                                                        'key2': [{'an': 'other', 'list': 'with'}, 
                                                                {'dictionaries': ['in', 'it']}]
                                                        }
                            }

In [ ]:

anx_s1.update_info(complicated_nested_dict)

In [ ]:

anx_s1 = project.subjects.find_first('label=anx_s1').reload()
pprint.pprint(f'Info field: {anx_s1.info}')

You can find the same information in Flywheel under the custom information field of the anx_s1 Subject:

All the metadata shown in the UI are also accessible from the SDK. For instance if you would like to show all the properties of the anx_s1 Subject, just display its container:

Bonus: Update Subject Metadata with a CSV file¶

Updating Subject Metadata/Info can be made by parsing CSV file or TSV file. By using this method, you can modify metadata for each Subject all at once.

In this example, you will need to access the participants.csv file which can be found in the .zip folder you downloaded earlier.

First, you will need read the csv file with pandas (which imported as pd).

In [ ]:

metadata = pd.read_csv(PATH_TO_DATA/'participants.csv')

In [ ]:

# View the data in the csv file 
display(metadata)

We are going to loop through each Subjects in the Flywheel instance and check if there is any metadata stored in the metadata dataframe.

If the Subject is in the metadata dataframe, we will add the age and treatment information into the Subject container and update the sex metadata for each Subject.

In [ ]:

for subj in project.subjects.iter():
    if (metadata["participant_id"] == subj.label).any():
        # Get data of the subject from the `metadata`
        tmp_info = metadata.loc[(metadata["participant_id"] == subj.label)]
        # Get the age and treatment for the subject
        # Convert the information to dictionary with value stored in a list
        other_metadata = tmp_info[['age', 'treatment']].to_dict('l')
        # Update the metadata contains in the subject container
        sex = tmp_info.iloc[0]['sex']
        subj.update(type='human', sex = sex)
        subj.update_info(other_metadata)
    else:
        print(subj.label + ' does not have metadata stored in the CSV file.')

View the updated metadata in the Subject container

In [ ]:

for subj in project.subjects.iter():
    subj = subj.reload()
    print(f'Subject Label: {subj.label}, Sex: {subj.sex}, Info: {subj.info}')

Requirements¶

Install and Import Dependencies¶

Download some test data¶

Flywheel API Key and Client¶

Understand the Flywheel Hierarchy¶

Initialize a few values¶

Requirements¶

Add a New Project¶

Modify Project Gear Rules¶

Create Subjects, Sessions and Acquisitions and upload files¶

What's the plan?¶

Helpful Functions¶

Processing¶

Getting ready¶

Getting the work done¶

Update Subject Metadata¶

Bonus: Update Subject Metadata with a CSV file¶