Title: Delete Empty Containers

Date: June 24th 2020

Description:
This notebook demonstrates how to remove empty containers with a top-down method.

Install and Import Dependencies¶

In [ ]:
# Install specific packages required for this notebook
!pip install flywheel-sdk tqdm pandas
In [ ]:
# Import packages
import os
from getpass import getpass
import logging
import time
from pathlib import Path

import flywheel
import pandas as pd
from tqdm.notebook import tqdm
from permission import check_user_permission
In [ ]:
# Instantiate a logger
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
log = logging.getLogger('root')

Flywheel API Key and Client¶

Get your API_KEY. More on this at in the Flywheel SDK doc here.

In [ ]:
API_KEY = getpass('Enter API_KEY here: ')

Instantiate the Flywheel API client

In [ ]:
fw = flywheel.Client(API_KEY if ('API_KEY' in locals() and API_KEY) else os.environ.get('FW_KEY'))

Show Flywheel logging information

In [ ]:
log.info('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])

Overview¶

Flywheel data model relies on hierarchical containers. You can read more about the flywheel containers in our documentation here.

Flywheel Project are structure into the following hierarchy:

Group
└── Project
    └── Subject 
        └── Session
            └── Acquisition

Each of Project, Subject, Session and Acquisition are containers. Containers share common properties such as the ability to store files, metadata or analyses.

How does the top-down approach work?¶

Based on the Flywheel Hierarchy above, the top-down approach will start from the Subject container and traverse down through Session and Acquisition containers. This method will remove Subject, Session and Acquisition containers that have no children containers, and no files nor analyses attached to the container.

Requirements¶

In order to run this notebook, you will need to have the right permission on the Group level to create a new Project for testing.

In [ ]:
# Minimum requirements that you will need to remove containers on the Project level. 
min_reqs = {
"site": "user",
"group": "admin"
}
In [ ]:
GROUP_ID = input('Please enter the Group ID that you will be using to create the new project: ')
In [ ]:
check_user_permission(fw, min_reqs, group = GROUP_ID)

Initialize a few values¶

Now, we will be defining a few values that will be use in this notebook. The GROUP_ID is the Group ID that you will be using throughout this notebook.

In [ ]:
GROUP_ID = input('Please enter the Group that you have admin permission for')
In [ ]:
PROJECT_LABEL = 'test-delete-containers'

Please defined below the path to file that you would like to use for testing which will be uploaded to your Flywheel instance

In [ ]:
PATH_TO_TEST_FILE = Path("/path/to/a/test/file")
TEST_FILE_BASENAME = PATH_TO_TEST_FILE.name
INFO: For tutorial purposes, we are creating a test project and uploading some DICOM files on Acquisition container. Feel free to use one of your test projects and skip the 'Create A New Test Project' section.

Create A New Test Project¶

In [ ]:
my_group = fw.lookup(GROUP_ID)
In [ ]:
project=my_group.add_project(label=PROJECT_LABEL)

Create Subject, Session and Acquisition container and upload File¶

Here, we will be create one Subject container and in that Subject container, we will be adding one Session and in that Session, there will be one Acquisition added. Here we will also upload the File to the Acquisition that created.

In [ ]:
# Create Subject
subject = project.add_subject(label='Subject 01')
# Create Session
session = subject.add_session(label='Session 01')
# Create Acquisition
acquisition = session.add_acquisition(label='Localizer')
# Upload File
acquisition.upload_file(PATH_TO_TEST_FILE)

Helpful Functions¶

In [ ]:
def delete_empty_acquisition(acquisition, dry_run=True):
    """Returns True if acquisition was empty and got deleted.
    
    Args:
        acquisition (object): A Flywheel Acquisition.
        dry_run (bool): If true, container is not deleted.    
        
    Returns:
        bool: True if container got deleted, False otherwise.
    """
    log.debug(f'Checking if acquisition "{acquisition.label}" is empty')
    num_files = len(acquisition.files)
    log.debug(f'  Found {num_files} files')
    delete_acquisition = num_files == 0
    if delete_acquisition:
        log.info(f'Deleting acquisition "{acquisition.label}"')
        if not dry_run:
            fw.delete_acquisition(acquisition.id)
    return delete_acquisition
In [ ]:
def delete_empty_session(session, dry_run=True):
    """Returns True if session was empty and got deleted.
    
    Args:
        session (object): A Flywheel Session.
        dry_run (bool): If true, container is not deleted.    
        
    Returns:
        bool: True if container got deleted, False otherwise.
    """        
    log.debug(f'Checking if session "{session.label}" is empty')
    num_files = len(session.files)
    num_acqs = len(session.acquisitions())
    log.debug(f'  Found {num_files} files')
    log.debug(f'  Found {num_acqs} acquisitions')
    delete_session = (num_acqs == 0) and (num_files == 0)
    if (num_acqs == 0) and num_files > 0:
        log.warning(f'Empty session but file attachment - Not deleting! ({session.id} / {session.label})')
    if delete_session:
        log.info(f'Deleting session "{session.label}"')
        if not dry_run:
            fw.delete_session(session.id)
    return delete_session
In [ ]:
def delete_empty_subject(subject, dry_run=True):
    """Returns True if subject was empty and got deleted.
    
    Args:
        subject (object): A Flywheel Subject.
        dry_run (bool): If true, container is not deleted.    
        
    Returns:
        bool: True if container got deleted, False otherwise.
    """        
    log.debug(f'Checking if subject "{subject.label}" is empty')
    num_files = len(subject.files)
    num_sessions = len(subject.sessions())
    log.debug(f'  Found {num_files} files')
    log.debug(f'  Found {num_sessions} sessions')    
    delete_subject = (num_files == 0) and (num_sessions == 0)
    if (num_sessions == 0) and num_files > 0:
        log.warning(f'Empty subject but file attachments! - Not deleting!  ({subject.id} / {subject.label})')
    if delete_subject:
        log.info(f'Deleting subject "{subject.label}"')
        if not dry_run:        
            fw.delete_subject(subject.id)
    return delete_subject
In [ ]:
def delete_empty_containers_in_project(project, dry_run=True):
    """Delete empty containers in project hierarchy and returns a dataframe of delete containers
    
    Args:
        project (object): A Flywheel project.
        dry_run (bool): If true, container is not deleted.    
        
    Returns:
        pandas.DataFrame: A dataframe listing deleted containers
    """
    df = pd.DataFrame(columns=['type', 'label', 'id', 'parents.subject', 'parents.session'])
    subjects = project.subjects()
    for subject in tqdm(subjects):
        for session in subject.sessions.iter():
            for acquisition in session.acquisitions.iter():
                deleted = delete_empty_acquisition(acquisition, dry_run=dry_run)
                if deleted:
                    df = df.append(dict(zip(df.columns, ['acq', acquisition.label, acquisition.id, acquisition.parents.subject, acquisition.parents.session])), ignore_index=True)
            session = session.reload()
            deleted = delete_empty_session(session, dry_run=dry_run)
            if deleted:
                df = df.append(dict(zip(df.columns, ['ses', session.label, session.id, session.parents.subject, None])), ignore_index=True)
        subject = subject.reload()
        deleted = delete_empty_subject(subject, dry_run=dry_run)
        if deleted:
            df = df.append(dict(zip(df.columns, ['sub', subject.label, subject.id, None, None])), ignore_index=True)
    return df

Getting Started¶

First, we are going to do a dry run by setting dry_run to True before actually deleting the Subjects container.

In [ ]:
df = delete_empty_containers_in_project(project, dry_run=True)
In [ ]:
len(df)

Now we can try to actually delete the empty containers

In [ ]:
df = delete_empty_containers_in_project(project)
In [ ]:
len(df)

As expected, it didn't delete the Subject 01 subject container as the Acquisition contains a file.

So let's try to delete the file that we have uploaded earlier to the acquisition. If you recall, the file that is being uploaded is named as TEST_FILE_BASENAME. We will be using the delete_file method to delete the file from the Acquisition container.

INFO: You can also use delete_file method to delete file from the Session or Subject container
In [ ]:
acquisition.delete_file(TEST_FILE_BASENAME)

After deleting the file, we can try to delete the container again.

In [ ]:
df = delete_empty_containers_in_project(project)
In [ ]:
df

Do I have the proper permissions to delete a container in my project?¶

If you have a project where you would like to remove/delete empty containers, you will need to have the right permissions to delete/modify the containers on the Project level. Below are the minimum requirements.

In [ ]:
# Minimum requirements that you will need to delete/modify containers within the Project.
min_reqs = {
"site": "user",
"group": "rw",
"project":[
    'containers_modify_metadata',
    'containers_delete_hierarchy',
    'files_create_upload',
    'files_modify_metadata',
    'files_delete_non_device_data',
    'files_delete_device_data',
    ]
}
In [ ]:
GROUP_ID = input('Please enter the Group ID that you will be using to create the new project: ')
In [ ]:
PROJECT_LABEL = input('Please enter the Project Label that you want to work with in this notebook: ')
In [ ]:
check_user_permission(fw, min_reqs, group = GROUP_ID, project = PROJECT_LABEL)

After you have verified that you have the right permissions to delete/remove containers in the desired project, you can get the project container and call delete_empty_containers_in_project function again.

In [ ]:
project = fw.projects.find_first('label={PROJECT_LABEL}')
In [ ]:
delete_empty_containers_in_project(project, dry_run=True)