Getting Started

Introduction

flywheel-migration is a python package maintained by Flywheel. It provides a standardized set of tools to de-identify files metadata.

It supports a number of configurable options around how de-identification happens. The majority of these options are configured via de-identification profile files, which can either be YAML or JSON. Such profile defines the tranformations to be applied on the file metadata fields.

The following file types are currently supported:

  • Dicom

  • JPG

  • PNG

  • TIFF

  • XML

  • JSON

  • Text file defining key/value pair

  • CSV

  • TSV

More on file profiles here.

The following transformations are currently supported:

  • remove: Removes the field from the metadata

  • replace-with: Replaces the contents of the field with the value provided

  • increment-date: Offsets the date by the number of days

  • increment-datetime: Offsets the datetime by the number of days

  • hash: Replace the contents of the field with a one-way cryptographic hash

  • hasuid: Replaces a UID field with a hashed version of that field

  • encrypt (non-DICOM): Encrypts the field in place with AES-EAX encryption

  • encrypt (DICOM): Removes the field from the DICOM and stores the original value in EncryptedAttributesSequence with CMS encryption

  • decrypt (non-DICOM): Decrypts the field in place with AES-EAX decryption

  • decrypt (DICOM): Replace the contents of the field with the value stored in EncryptedAttributesSequence with CMS decryption

  • regex-sub: Replace the contents of the field with a value built from other fields and/or group extracted from the field value.

  • keep: Do nothing

More on field transformations here.

License

flywheel-migration is developed under an MIT-based license.

Installation

The package can be installed using pip:

pip install flywheel-migration[pixel]

Note that the [pixel] option is required to install the dependencies for de-identifying pixel data. If you do not need to de-identify pixel data, you can omit this option.

For development, please refer to the README.md.

Quick start

An example config.yaml looks like this for a de-id profile using Dicom profile:

# The name of the de-id profile
name: An example
# A description of the de-id profile
description: An example of de-id profile using Dicom file profile
# Configuration for DICOM de-identification
dicom:
  # What date offset to use, in number of days
  date-increment: -17

  # Set patient age from date of birth
  patient-age-from-birthdate: true
  # Set patient age units as Years
  patient-age-units: Y
  # Remove private tags
  remove-private-tags: true

  fields:
    # Replace a dicom field value  (e.g.remove PatientID)
    - name: PatientID
      replace-with: REDACTED

    # Remove a dicom field value (e.g. replace “StationName” with "XXXX")
    - name: StationName
      remove: true

    # Increment a date field by -17 days
    - name: StudyDate
      increment-date: true

    # Increment a datetime field by -17 days
    - name: AcquisitionDateTime
      increment-datetime: true

    # One-Way hash a dicom field to a unique string
    - name: AccessionNumber
      hash: true

    # One-Way hash the ConcatenationUID,
    # keeping the prefix (4 nodes) and suffix (2 nodes)
    - name: ConcatenationUID
      hashuid: true

    # Replace SeriesDescription with reference to other field within
    # the same record using PulseSequenceName, TE and TR
    - name: SeriesDescription
      regex-sub:
        # regex-sub value is a list of dict each defining input-regex,
        # output and groups.
        # input-regex: Regular expression matching SeriesDescription value
        - input-regex: '(?P<current_sd>.*\/.*)'
          # output: String to be formatted which follows python f string notation
          output: '{current_sd}_{PulseSequenceName}_TE{TE}_TR{TR}'
          # de-id actions to be applied to each field defined in output
          groups:
            - name: current_sd
              keep: true
            - name: PulseSequenceName
              keep: true
            - name: TE
              keep: true
            - name: TR
              keep: true

Assuming a folder with Dicom files at location ~/my_dicoms, and the above YAML configuration saved in the current working directory as config.yaml, the following few lines of code will de-identify the Dicom files and save them at location ~/my_deid_dicoms:

from fs.osfs import OSFS
from flywheel_migration import deidentify

# Load the profile
profile = deidentify.load_profile('config.json')

# Define source, destination file system and list of dicom to process
src_fs = OSFS("~/my_dicoms")
dst_fs = OSFS("~/my_deid_dicoms")
paths = src_fs.listdir('.')

# Process the Dicom files and save de-id files at dst_fs
profile.process_packfile("dicom", src_fs, dst_fs, paths)