Getting Started
Introduction
flywheel-migration
is a python package maintained by Flywheel.
It provides a standardized set of tools to de-identify files metadata.
It supports a number of configurable options around how de-identification happens. The majority of these options are configured via de-identification profile files, which can either be YAML or JSON. Such profile defines the tranformations to be applied on the file metadata fields.
The following file types are currently supported:
Dicom
JPG
PNG
TIFF
XML
JSON
Text file defining key/value pair
CSV
TSV
More on file profiles here.
The following transformations are currently supported:
remove
: Removes the field from the metadatareplace-with
: Replaces the contents of the field with the value providedincrement-date
: Offsets the date by the number of daysincrement-datetime
: Offsets the datetime by the number of dayshash
: Replace the contents of the field with a one-way cryptographic hashhasuid
: Replaces a UID field with a hashed version of that fieldencrypt
(non-DICOM): Encrypts the field in place with AES-EAX encryptionencrypt
(DICOM): Removes the field from the DICOM and stores the original value in EncryptedAttributesSequence with CMS encryptiondecrypt
(non-DICOM): Decrypts the field in place with AES-EAX decryptiondecrypt
(DICOM): Replace the contents of the field with the value stored in EncryptedAttributesSequence with CMS decryptionregex-sub
: Replace the contents of the field with a value built from other fields and/or group extracted from the field value.keep
: Do nothing
More on field transformations here.
License
flywheel-migration
is developed under an MIT-based
license.
Installation
The package can be installed using pip:
pip install flywheel-migration[pixel]
Note that the [pixel]
option is required to install the dependencies for
de-identifying pixel data. If you do not need to de-identify pixel data, you
can omit this option.
For development, please refer to the README.md.
Quick start
An example config.yaml looks like this for a de-id profile using Dicom profile:
# The name of the de-id profile
name: An example
# A description of the de-id profile
description: An example of de-id profile using Dicom file profile
# Configuration for DICOM de-identification
dicom:
# What date offset to use, in number of days
date-increment: -17
# Set patient age from date of birth
patient-age-from-birthdate: true
# Set patient age units as Years
patient-age-units: Y
# Remove private tags
remove-private-tags: true
fields:
# Replace a dicom field value (e.g.remove PatientID)
- name: PatientID
replace-with: REDACTED
# Remove a dicom field value (e.g. replace “StationName” with "XXXX")
- name: StationName
remove: true
# Increment a date field by -17 days
- name: StudyDate
increment-date: true
# Increment a datetime field by -17 days
- name: AcquisitionDateTime
increment-datetime: true
# One-Way hash a dicom field to a unique string
- name: AccessionNumber
hash: true
# One-Way hash the ConcatenationUID,
# keeping the prefix (4 nodes) and suffix (2 nodes)
- name: ConcatenationUID
hashuid: true
# Replace SeriesDescription with reference to other field within
# the same record using PulseSequenceName, TE and TR
- name: SeriesDescription
regex-sub:
# regex-sub value is a list of dict each defining input-regex,
# output and groups.
# input-regex: Regular expression matching SeriesDescription value
- input-regex: '(?P<current_sd>.*\/.*)'
# output: String to be formatted which follows python f string notation
output: '{current_sd}_{PulseSequenceName}_TE{TE}_TR{TR}'
# de-id actions to be applied to each field defined in output
groups:
- name: current_sd
keep: true
- name: PulseSequenceName
keep: true
- name: TE
keep: true
- name: TR
keep: true
Assuming a folder with Dicom files at location ~/my_dicoms
, and the above YAML
configuration saved in the current working directory as config.yaml
, the
following few lines of code will de-identify the Dicom files and save
them at location ~/my_deid_dicoms
:
from fs.osfs import OSFS
from flywheel_migration import deidentify
# Load the profile
profile = deidentify.load_profile('config.json')
# Define source, destination file system and list of dicom to process
src_fs = OSFS("~/my_dicoms")
dst_fs = OSFS("~/my_deid_dicoms")
paths = src_fs.listdir('.')
# Process the Dicom files and save de-id files at dst_fs
profile.process_packfile("dicom", src_fs, dst_fs, paths)