Title: Upload data to a Flywheel project
Date: April 28th 2020
Description:
This notebook shows how to upload data to a new project using the Flywheel SDK.
Topics that will be covered:
# Install specific packages required for this notebook
!pip install flywheel-sdk pydicom
# Import packages
from getpass import getpass
import logging
import os
from pathlib import Path
import re
import time
from IPython.display import display, Image
import flywheel
from permission import check_user_permission
# Instantiate a logger
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
log = logging.getLogger('root')
In this notebook we will be uploading images to a Flywheel Instance.
To get started, your first need to download the test dataset that will be used in this notebook.
On mybinder.org or any Mac/Linux system, the following commands will download a zip archive and unzip the data into a folder called data-upload-notebook
in your current directory:
!curl -L -o data.zip "https://drive.google.com/uc?export=download&id=1aDgZhm94-N0x2WKAIxr2QpwD4M20va0W"
!unzip -qf data.zip -d data-upload-notebook
If the previous commands return an errors, download the file directly using the link provided to the curl
command
above and extract the archive in the current working directory to a folder named data-upload-notebook
The file tree of data-upload-notebook
should like this:
data-uplodate-notebook
├── anx_s1
│ └── anx_s1_anx_ses1_protA
│ └── T1_high-res_inplane_Ret_knk_0
│ └── 6879_3_1_t1.dcm.zip
├── anx_s2
│ └── anx_s2_anx_ses1_protA
│ └── T1\ high-res\ inplane\ FSPGR\ BRAVO_0
│ └── 4784_3_1_t1.dcm.zip
├── anx_s3
│ └── anx_s3_anx_ses1_protA
│ └── T1_high-res_inplane_Ret_knk_0
│ └── 6879_3_1_t1.dcm.zip
├── anx_s4
│ └── anx_s4_anx_ses2_protB
│ └── T1_high-res_inplane_Ret_knk_1
│ └── 8403_4_1_t1.dcm.zip
├── anx_s5
│ └── anx_s5_anx_ses1_protA
│ └── T1_high-res_inplane_Ret_knk_1
│ └── 8403_4_1_t1.dcm.zip
└── participants.csv
Get your API_KEY. More on this at in the Flywheel SDK doc here.
API_KEY = getpass('Enter API_KEY here: ')
Instantiate the Flywheel API client
fw = flywheel.Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))
Show Flywheel logging information
log.info('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])
Flywheel data model relies on hierarchical containers. You can read more about the flywheel containers in our documentation here.
In flywheel project are structure into the following hierarchy:
Group
└── Project
└── Subject
└── Session
└── Acquisition
Each of Project, Subject, Session and Acquisition are containers. Containers shared common properties such as the ability to store files, metadata or analyses.
In this notebook we will be:
In this notebook, we will be uploading data to a Project. The label of the Project will be defined by the PROJECT_LABEL
variable defined below.
Here we set it up to be AnxietyStudy01
but feel free to change it to something that makes more sense to you.
PROJECT_LABEL = 'AnxietyStudy01'
In Flywheel each project belongs to a Group. The label of the Group that will be used to create the Project is defined by the GROUP_LABEL
variable below.
Specify the Group you have r/w permission on and where the Project will be created:
GROUP_LABEL = '<your-group-label>'
We also define a varibale that pointed to the root directory where the data got downloaded. If you have followed the steps above to download your data, you should have all the data in a folder called data-upload-notebook
. If that's not the case, edit the below variable accordingly.
PATH_TO_DATA = Path('data-upload-notebook')
Before starting off, we want to check your permission on the Flywheel Instance in order to proceed in this notebook.
min_reqs = {
"site": "user",
"group": "admin"
}
GROUP_ID = input('Please enter the Group ID that you will use in this notebook: ')
check_user_permission
will return True if both the group meet the minimum requirement, else a compatible list will be printed.
check_user_permission(fw, min_reqs, group=GROUP_ID)
In this section, we will be creating a new project with label PROJECT_LABEL
in the Group's GROUP_LABEL
.
First, we will be getting the Group using the lookup
method.
my_group = fw.lookup(GROUP_LABEL)
Before creating a new project, it is a good practice to check if the Project you are trying to create exists in the Flywheel instance or not. We can do this by checking if a Project with label=PROJECT_LABEL exists in the Group you have specified:
project = my_group.projects.find_first(f'label={PROJECT_LABEL}')
if project:
log.info(f'Project {GROUP_LABEL}/{PROJECT_LABEL} already exists. Please update your PROJECT_LABEL variable.')
else:
log.info(f'Project {GROUP_LABEL}/{PROJECT_LABEL} does not exist. Looking all good.')
If the Project does not exist, it will return False and we can create it. If a Project was found, it will return the Project and in that case, either update the PROJECT_LABEL to something different to create a new project OR make sure that the data that you are about to upload will not interfere with the data already present in the Project
if not project:
project = my_group.add_project(label=PROJECT_LABEL)
else:
raise ValueError(f'Project {PROJECT_LABEL} already exists in group {GROUP_LABEL}, please pick another project label.')
After a new Project is being created, we will be disabling the Gear Rules for demo purposes.
First, we use get_project_rules
to get a list of all rules for a project.
gear_rules = fw.get_project_rules(project.id)
If the Gear Rules does not exist, gear_rules
will return False. If there is Gear Rules setup in the project, it will return True, and disable the Gear Rule if disabled == False
.
if gear_rules:
for rule in gear_rules:
if rule.disabled == False:
rule_obj = {'disabled': True}
fw.modify_project_rule(project.id, rule.id, rule_obj)
Now that we have a Project, we can create all the containers that are required to host our dataset.
Following the Flywheel Hierarchy, we will loop through each subject folders and either get the Subject if it exists in the Project already or create it not ( we will use the get_or_create_subject
function below for this). We will do the same to create the Session and Acquisition containers. Once we get down to the Acqusition container, we will upload the corresponding DICOM archive to it (we will use the upload_file_to_acquistion
function below for this)
def get_or_create_subject(project, label, update=True, **kwargs):
"""Get the Subject container if it exists, else create a new Subject container.
Args:
project (flywheel.Project): A Flywheel Project.
label (str): The subject label.
update (bool): If true, update container with key/value passed as kwargs.
kwargs (dict): Any key/value properties of subject you would like to update.
Returns:
(flywheel.Subject): A Flywheel Subject container.
"""
if not label:
raise ValueError(f'label is required (currently {label})')
subject = project.subjects.find_first(f'label={label}')
if not subject:
subject = project.add_subject(label=label)
if update and kwargs:
subject.update(**kwargs)
if subject:
subject = subject.reload()
return subject
def get_or_create_session(subject, label, update=True, **kwargs):
"""Get the Session container if it exists, else create a new Session container.
Args:
subject (flywheel.Subject): A Flywheel Subject.
label (str): The session label.
update (bool): If true, update container with key/value passed as kwargs.
kwargs (dict): Any key/value properties of Session you would like to update.
Returns:
(flywheel.Session): A flywheel Session container.
"""
if not label:
raise ValueError(f'label is required (currently {label})')
session = subject.sessions.find_first(f'label={label}')
if not session:
session = subject.add_session(label=label)
if update and kwargs:
session.update(**kwargs)
if session:
session = session.reload()
return session
def get_or_create_acquisition(session, label, update=True, **kwargs):
"""Get the Acquisition container if it exists, else create a new Acquisition container.
Args:
session (flywheel.Session): A Flywheel Session.
label (str): The Acquisition label.
update (bool): If true, update container with key/value passed as kwargs.
kwargs (dict): Any key/value properties of Acquisition you would like to update.
Returns:
(flywheel.Acquisition): A Flywheel Acquisition container.
"""
if not label:
raise ValueError(f'label is required (currently {label})')
acquisition = session.acquisitions.find_first(f'label={label}')
if not acquisition:
acquisition = session.add_acquisition(label=label)
if update and kwargs:
acquisition.update(**kwargs)
if acquisition:
acquisition = acquisition.reload()
return acquisition
def upload_file_to_acquistion(acquistion, fp, update=True, **kwargs):
"""Upload file to Acquisition container and update info if `update=True`
Args:
acquisition (flywheel.Acquisition): A Flywheel Acquisition
fp (Path-like): Path to file to upload
update (bool): If true, update container with key/value passed as kwargs.
kwargs (dict): Any key/value properties of Acquisition you would like to update.
"""
basename = os.path.basename(fp)
if not os.path.isfile(fp):
raise ValueError(f'{fp} is not file.')
if acquistion.get_file(basename):
log.info(f'File {basename} already exists in container. Skipping.')
return
else:
log.info(f'Uploading {fp} to acquisition {acquistion.id}')
acquistion.upload_file(fp)
while not acquistion.get_file(basename): # to make sure the file is available before performing an update
acquistion = acquistion.reload()
time.sleep(1)
if update and kwargs:
f = acquisition.get_file(basename)
f.update(**kwargs)
The files we want to upload are DICOM zip archive. Let's get a list of all of them:
files_to_upload = list(PATH_TO_DATA.rglob('*.dcm.zip'))
dl = '\n'
print(f'Files to upload: \n{dl.join(map(str, files_to_upload))}')
In this notebook we will parse the Subject, Session and Acquisition labels from the folders and subfolder path directly. If we wanted to do more, we could use regular expression on the path (e.g. something like r'^data-upload-notebook/(?P<sub_label>[\w\d]+)/.+(?P<ses_label>ses[\d\w\_]+)/(?P<acq_label>.+)'
)
We are now ready to walk our folders, create the containers accordingly and upload the DICOM zip archive to the Acquisition container.
log.info('Starting upload...')
for subj in PATH_TO_DATA.glob('anx*'):
log.info('Processing subject %s', str(subj))
subject = get_or_create_subject(project, subj.name, update=True, type='human', sex='female') # passing some value for the sake of the example
# print(f'{subject.sex}')
for ses in subj.glob('anx*'):
log.info('Processing session %s', str(ses))
session = get_or_create_session(subject, ses.name)
for acq in ses.glob('T1*'):
log.info('Processing acquisition %s', str(acq))
acquisition = get_or_create_acquisition(session, acq.name)
for file in acq.glob('*.dcm.zip'):
upload_file_to_acquistion(acquisition, file)
log.info('DONE')
Once the upload is done, you should have all your data available in your Flywheel Project, which should look like this:
For sake of example, let's demonstarate how we can update the metadata for Subject anx_s1
.
Let's first find that specific Subject:
anx_s1 = project.subjects.find_first('label=anx_s1').reload()
reload()
is nessecary to load the entire container.
We are going to update the firstname, lastname and the sex of this Subject. Let's check what we have currently:
print(f'Subject anx_s1 sex is: {anx_s1.sex}, first name is: {anx_s1.firstname}, last name is: {anx_s1.lastname}')
We can update it with the update
method of the container:
anx_s1.update(
firstname='John',
lastname='Doe',
sex='male',
)
Let's reload the subject from the database to make sure the update went through:
anx_s1 = project.subjects.find_first('label=anx_s1').reload()
print(f'Subject anx_s1 sex is: {anx_s1.sex}, first name is: {anx_s1.firstname}, last name is: {anx_s1.lastname}')
Each container also contains a field called info
which can be used to stored unstructured information in a dictionary.
complicated_nested_dict = {'a_complicated_nested_dict': {'key1': [1, 2, 3, 4],
'key2': [{'an': 'other', 'list': 'with'},
{'dictionaries': ['in', 'it']}]
}
}
anx_s1.update_info(complicated_nested_dict)
anx_s1 = project.subjects.find_first('label=anx_s1').reload()
pprint.pprint(f'Info field: {anx_s1.info}')
You can find the same information in Flywheel under the custom information field of the anx_s1
Subject:
All the metadata shown in the UI are also accessible from the SDK. For instance if you would like to show all the properties of the anx_s1
Subject, just display its container:
anx_s1
Updating Subject Metadata/Info can be made by parsing CSV file or TSV file. By using this method, you can modify metadata for each Subject all at once.
In this example, you will need to access the participants.csv
file which can be found in the .zip folder you downloaded earlier.
First, you will need read the csv file with pandas
(which imported as pd
).
metadata = pd.read_csv(PATH_TO_DATA/'participants.csv')
# View the data in the csv file
display(metadata)
We are going to loop through each Subjects in the Flywheel instance and check if there is any metadata stored in the metadata
dataframe.
If the Subject is in the metadata
dataframe, we will add the age
and treatment
information into the Subject container and update the sex
metadata for each Subject.
for subj in project.subjects.iter():
if (metadata["participant_id"] == subj.label).any():
# Get data of the subject from the `metadata`
tmp_info = metadata.loc[(metadata["participant_id"] == subj.label)]
# Get the age and treatment for the subject
# Convert the information to dictionary with value stored in a list
other_metadata = tmp_info[['age', 'treatment']].to_dict('l')
# Update the metadata contains in the subject container
sex = tmp_info.iloc[0]['sex']
subj.update(type='human', sex = sex)
subj.update_info(other_metadata)
else:
print(subj.label + ' does not have metadata stored in the CSV file.')
View the updated metadata in the Subject container
for subj in project.subjects.iter():
subj = subj.reload()
print(f'Subject Label: {subj.label}, Sex: {subj.sex}, Info: {subj.info}')
You can also check the updated information in Flywheel under the Subject container.