Title: Adding REDCap fields to Flywheel metadata
Date: 17-04-2020
Description:
A real example demonstrating how to access REDCap through PyCap, how to search and view the data, and how to add the data to the appropriate flywheel container
REDCap is an online tool for acquiring questionnaire style data, typically on subjects, though the range of uses is virtually limitless. For example, one REDCap site could provide subjects with a series of questions to answer over the course of a study. Another REDCap site could simply contain one form for each subject to fill, possibly some kind of enrollment data. Another REDCap site could be made with a series of forms designed to track the progress of a project. Because of the diverse possibilities of REDCap implementations, there is no single script that can be designed to run for all cases. The purpose of this notebook is to introduce you to the tools required to integrate REDCap and Flywheel. Using these tools, a custom script will have to be created based on your specific use case.
Flywheel and REDCap both organize their data slightly differently. Flywheel has a straightforward hierarchy:
This Hierarchy is strictly enforced (you will never find an acquisition containing a subject, and a subject cannot have an acquisition without also having a session).
Each Project can have multiple subjects.
Each Subject can have multiple sessions.
Each Session can have multiple acquisitions.
REDCap follows a similar pattern:
Although REDCap has a similar structure, it does not rigidly enforce how this hierarchy is utilized. Here we provide a common use case, but different studies may utilize different strategies:
Each Project can have multiple arms.
Each arm can have multiple events.
Each event can have multiple forms.
Each form can have multiple fields.
Because of the differences between the hierarchies, and the flexibility that REDCap users have with how they structure their data, the exact method for extracting redcap data and mapping it to the Flywheel project will require intimate knowledge of both the structure of your flywheel project, as well as the REDCap project.
The first and most obvious problem to address for REDCap integration is "Where should the data go?".
For example, if a subject comes to the testing center, fills out REDCap forms, then gets a scan, how do we match that REDCap data to our flywheel data?
First, REDCap assigns a unique ID (record ID) to each subject, but this might not match what the researcher is using for the subject ID in Flywheel. One way to match the two datasets would be to include a field like "flywheel subject ID" in the enrollment data. That way, this field could be used to determine which Flywheel subject to attach the REDCap data to. For legacy data, a lookup table can be made, with REDCap ID's in one column, and the corresponding Flywheel ID in the other.
Second, the REDCap form will be part of a specific REDCap event, and that scan will be part of a specific Flywheel session. In this case, it would make sense that we would like to add that event data to the session's metadata. One possible way to do this would be to have the researcher add a field to the form called "flywheel session name", where the session name (Set in the scan terminal) would be entered. If this field matches the flywheel session name, then we can use that field when we query the REDCap data to determine which session to add the REDCap data to. For legacy data, a lookup table can be made, with REDCap event names in one column, and Flywheel session names in the other.
This section will download the necessary packages and setup our python environment
# Install specific packages required for this notebook
!pip install flywheel-sdk pycap
# Import packages
# Required Libraries:
import flywheel
from redcap import Project
# Optional, but used in this demo:
from getpass import getpass
import os
import sys
import pprint
import pandas as pd
from permission import check_user_permission
We will first initialize our flywheel SDK, which entails the following steps:
# We will use getpass to securely enter our API key in this notebook.
# If you download this code to run on your own machine, you may
# Replace this with a string of your API key
API_KEY = getpass('Enter API_KEY here: ')
# Initialize the flywheel client and print our login info
fw = flywheel.Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))
print('You are now logged in as %s to %s', \
fw.get_current_user()['email'], \
fw.get_config()['site']['api_url'])
Before starting off, we want to check your permission on the Flywheel Instance in order to proceed in this notebook.
min_reqs = {
"site": "user",
"group": "ro",
"project": ['containers_view_metadata',
'containers_create_hierarchy',
'containers_modify_metadata']
}
GROUP_ID = input('Please enter the Group ID for the RedCap project: ')
PROJECT_LABEL = input('Please enter the Project Label that is correspond to the RedCap Project')
check_user_permission
will return True if both the group and project meet the minimum requirement, else a compatible list will be printed.
check_user_permission(fw, min_reqs, group=GROUP_ID, project=PROJECT_LABEL)
# Access the flywheel project we're interested in working with:
# We can copy the ID directly from flywheel. You'll have to
# Replace this value with the project ID from your flywheel
# Instance.
project_id = '5e98a4362971c80073f877d1'
# Access the project with the SDK
fw_project = fw.get(project_id)
# Examine the subjects/sessions within the project
# Access the subjects
subjects = fw_project.subjects()
# Generate a map of the subject/session layout of this project
print('PROJECT: '+fw_project.label)
for sub in subjects:
print('|--------> SUBJECT: '+sub.label)
for ses in sub.sessions():
print('\t|--------> SESSION: '+ses.label)
for acq in ses.acquisitions():
print('\t\t|--------> ACQUISITION: '+acq.label)
# Generate a list of all subject ID's (labels) in the project
fw_project_subjects = [s.label for s in fw_project.subjects()]
# From this, we see that our project is called "RedcapIntegration"
# It has one subject with the ID "098"
# That subject has one session named "flywheel_session_01"
# That session has one acquisition named "flywheel_acquisition_01"
We will now initialize PyCap , which entails the following steps:
# Enter your REDCap API URL
# Enter the URL associated with your redcap API.
# This URL is NOT identical to the usual website
# you enter to visit your REDCap data. This URL
# always ends in "/API/"
# RedCap Login (replace this with your RedCap API URL)
URL = 'https://redcap.test.edu/redcap_v0.0.01/API/'
# Enter your REDcap API key
RC_API_KEY = ''
# Access the REDCap information we're interested
# in working with. This command creates a python
# object "rc_project", which allows us to access
# all the REDCap data associated with that project.
# WARNING: For large projects, using this interface
# May be slow, as many of the commands fetch ALL
# The data. Read more about filtering the results
# to reduce this time
# (https://pycap.readthedocs.io/en/latest/deep.html#exporting-data)
rc_project = Project(URL, RC_API_KEY)
# Examine the arms/events/forms within the project
# This command maps all forms and events filled out.
# (https://pycap.readthedocs.io/en/latest/api.html#redcap.project.Project.export_fem)
all_forms_and_events = rc_project.export_fem()
pprint.pprint(all_forms_and_events)
# Here we see that there are two arms to this project,
# each with two unique events. the first event has
# two forms associated with it, the second has one.
We will now examine the REDCap data to best determine how to interface with Flywheel. This will involve:
# Determine which field REDCap is using to store unique REDCap subject ID's
# Each project has a field that stores the unique redcap ID for each subject
# This field in REDCap may be labeled whatever the investigator wishes,
# However this label is always stored in the location project.def_field:
rcid_field_name = rc_project.def_field
print(rcid_field_name)
# From this we see that the rcid_field_name is "participant_id".
# This stores the record ID (like a subject ID)
# within redcap.
# Now we can print the enrollment data to find our flywheel ID field:
# For this query, we're only interested in the "enrollment_log" form
form_responses = rc_project.export_records(forms=['enrollment_log'])
pprint.pprint(form_responses)
# We can see here that this lists all the enrollment data from both arms
# and for all subjects present. Browsing through this, we
# see the field 'subject_fw_id'. This field was deliberately
# added to this REDCap study by ther PI to help link
# REDCap subjects to Flywheel subjects.
# Let's generate a list of REDCap ID's matching them to Flywheel ID's for future use.
# Who knows, may come in handy.
# It looks like the enrollment data is only present in the first enrollment form.
# So we will limit our events to the first event (timepoint_0_arm_1)
form_responses = rc_project.export_records(forms=['enrollment_log'],events=['timepoint_0_arm_1'])
record_2_fwid = {}
for response in form_responses:
record_2_fwid[response['participant_id']] = response['subject_fw_id']
pprint.pprint(record_2_fwid)
# We can also see that
# The first subject has Flywheel ID "098", which is our subject ID
# In the flywheel project of interest.
# This subject also has "participant_id" = 1, so their REDCap
# Record ID is "1". Now we have the REDCap id and the Flywheel ID.
# The enrollment data contains information about the subject that
# doesn't change over the course of the study. Because of this,
# We would like to upload this informatino to the flywheel
# "Subject" container. With this knowlege, we can filter out all
# other subject records.
# Let's just examine the records from the one subject now.
# We know we would like the enrollment data and the medical
# History, since these data points won't change session to session.
# We will upload these two forms to the "subject" container in flywheel
form_responses = rc_project.export_records(records=['1'],forms=['enrollment_log','medical_history'],events=['timepoint_0_arm_1'])
pprint.pprint(list(form_responses))
# Upload the data to flywheel
# In this case there is only one response, but if we
# expanded our results to include more "records" (subjects)
# the list would be longer, and this code would loop
# through each form response and upload it to the
# appropriate flywheel subject.
for response in form_responses:
# Check to see if this flywheel subject exists
if response['subject_fw_id'] in fw_project_subjects:
# Get that subject from flywheel
query = f'label="{response["subject_fw_id"]}"'
subject = fw_project.subjects.find_first(query)
# Upload the data under a "REDCap" object in the metadata
subject.update(info={"REDCap":response})
We will now upload questionnaire data to a specific session in flyhweel. For the enrollment data, we were able to use the field "subject_fw_id" to determine which subject to attach the data to. Now, we need to upload the data to a specific session. Since this data has no "session_fw_id" tag, we will use a lookup table to match events to sessions. This will involve:
# Match events to sessions using a lookup table.
# We can refresh our memory of the structure of this REDCap data:
all_forms_and_events = rc_project.export_fem()
pprint.pprint(all_forms_and_events)
# We could do this two ways. We could add forms by arm_num, and just exclude
# the enrollment and medical form, or we could directly match unique event
# names to sessions. We will do the latter.
event_2_session = {'timepoint_1_arm_1':'flywheel_session_01'}
# This list would obviously be longer if we had other subjects/sessions
# We will now loop through our desired events and upload them to flywheel:
rc_form_name = 'd1_baseline_questionnaires'
for event,session in event_2_session.items():
# We also know that we only want records for subject "098".
form_responses = rc_project.export_records(events=[event],forms=[rc_form_name])
# form_responses is a list with every subject's responses to that event
# So we must loop through and find the one we want
for response in form_responses:
pprint.pprint(response)
fw_id = record_2_fwid[response['participant_id']]
if fw_id in fw_project_subjects:
query = f'label="{fw_id}"'
subject = fw_project.subjects.find_first(query)
query = f'parents.subject={subject.id}'
session = fw_project.sessions.find_first(query)
pprint.pprint(session.label)
session.update(info={"REDCap":response})