Field Transformations

Warning

identity will be deprecated in version 14.0. Use keep action instead.

Field transformation defines the action to be taken on that field.

An example of a field transformation definition is:

- name: PatientID
  remove: true

name is a valid key for all type of fields and de-identification profiles. Value for name is profile dependent however. Please refer to the de-identification File Profile page.

In addition, certain profiles support the use of regex key in place of name. Please refer to this section to learn more about its use.

The following field transformations are supported:

remove

Removes the field from the file entirely. If removal is not supported then this will blank the field.

- name: StationName
  remove: true

replace-with

Replaces the contents of the field with the value provided. Please be aware of the the length of the field being replaced. (e.g. some DICOM fields only support a limited number of characters).

- name: PatientID
  replace-with: REDACTED

By default, the field will be created in the record if it does not exist. This behavior can be reversed by setting the boolean option replace-with-insert to False on the profile or the field. By default, replace-with-insert is defined at the file profile level (more on this here). Optionally, the field can define replace-with-insert as in the below example and take precedence:

- name: PatientID
  replace-with: REDACTED
  replace-with-insert: False

Important

If replace-with-insert is True and the field is not present in the record metadata, the field will be tentatively created. If a non-flat field is used instead (e.g. regex), then replace-with will not attempt to create any field.

increment-date

Offsets the date by the number of days defined in the date-increment setting of the file profile.

- name: StudyDate
  increment-date: true

By default, the date format used is the date-format defined at the file profile level (more on this here). Optionally, this field can also use an ad-hoc date format as in the below example:

- name: StudyDate
  increment-date: true
  date-format: "%Y-%m-%d"

Important

The user is responsible for setting a date-format which is valid for the file type being processed. The date-format is used to parse the date from the input file.

Note

Pass date-format as timestamp to input/output in unix timestamp format (float)

Additional configuration options are as follows:

  • jitter-date: (bool) Will perform a jitter based on the jitter-range and jitter-unit in addition to the normal increment

  • jitter-range: (int) Range to select random jitter from

  • jitter-unit: (str) Unit to jitter, select from [seconds, minutes, hours, days, weeks]

  • date-increment-override: (int) Override parent profile date increment

  • datetime-min: (str) Enforces a minimum date/datetime, in the format yyyymmdd for a specific date or [+-]<amount><unit> (e.g -0years for current date, -80years for 80 years before today) for a calculated date. Select valid unit type from [years, weeks, days].

  • datetime-max: (str) Enforces a maximum date/datetime, in the format yyyymmdd for a specific date or [+-]<amount><unit> for a calculated date.

increment-datetime

Offsets the date by the number of days defined in the date-increment setting of the file profile, preserving the time and timezone.

- name: AcquisitionDateTime
  increment-datetime: true

By default, the datetime format used is the datetime-format defined at the file profile level (more on this here). Optionally, this field can also use an ad-hoc datetime format as in the below example:

- name: StudyDate
  increment-date: true
  datetime-format: "%Y-%m-%d %H:%M:%S"

Important

The user is responsible for setting a datetime-format which is valid for the file type being processed. The datetime-format is used to parse the datetime from the input file.

Note

Pass datetime-format as timestamp to input/output in unix timestamp format (float)

Additional configuration options are as follows:

  • jitter-date: (bool) Will perform a jitter based on the jitter-range and jitter-unit in addition to the normal increment

  • jitter-range: (int) Range to select random jitter from

  • jitter-unit: (str) Unit to jitter, select from [seconds, minutes, hours, days, weeks]

  • date-increment-override: (int) Override parent profile date increment

  • datetime-min: (str) Enforces a minimum date/datetime, in the format yyyymmdd for a specific date or [+-]<amount><unit> (e.g -0years for current date, -80years for 80 years before today) for a calculated date. Select valid unit type from [years, weeks, days].

  • datetime-max: (str) Enforces a maximum date/datetime, in the format yyyymmdd for a specific date or [+-]<amount><unit> for a calculated date.

hash

Replace the contents of the field with a one-way cryptographic hash, in hexadecimal form. Only the first 16 characters of the hash will be used, in order to support short strings.

- name: AccessionNumber
  hash: true

hashuid

Replaces a UID field with a hashed version of that field. By default, the first four nodes (prefix) and last node (suffix) will be preserved, with the middle being replaced by the hashed value. For example: “1.2.840.113619.6.283.4.983142589.7316.1300473420.841” becomes “1.2.840.113619.551726.420312.177022.222461.230571.501817.841”

This field properties can be configured on the profile. More on this here.

- name: ConcatenationUID
  hashuid: true

Note

If the hashuid config leads to a string that have more that 64 characters, the value is truncated with the prefix and suffix preserved.

jitter

Offset the value by a random value. By default, the random value is drawn from a uniform distribution centered on 0 and in the range [-1, 1]. The range is controlled by the jitter-range property. The action can be applied on integers or floats.

This field properties can be configured on the profile. More on this here.

- name: PatientWeight
  jitter: true

The profile jitter-range and jitter-type configuration can be overwritten by specifying it at the field level as well. Configuration options jitter-min and jitter-max can be specified to enforce minimum and maximum values, respectively.

- name: PatientWeight
  jitter: true
  jitter-range: 10
  jitter-type: int
  jitter-min: 25
  jitter-max: 250

keep

Do nothing. Used for instance in the regex-sub action groups sections.

- name: SeriesDescription
  keep: true

Note

If only name is defined as key in the field configuration, it will default to the keep action.

identity

Do nothing. Used for instance in the regex-sub action groups sections.

- name: SeriesDescription
  identity: true

Warning

identity will be deprecated in version 14. Use keep action instead.

encrypt

Encrypts the contents of the field as specified by the profile and profile type.

Non-DICOM

Encryption for non-DICOM profiles uses standard AES-EAX which provides confidentiality and authenticity to encrypt the contents of the field in place. (See EAX mode)

Non-DICOM encryption uses a nonce length of 8 bytes, which is randomly generated at time of encryption. With AES-EAX and an 8-byte nonce length, source fields must be less than or equal to 38 characters to maintain Flywheel’s 64-length character limit. Ensuring this is up to the user. If character limit is exceeded, later decryption may not be successful and/or may cause Flywheel errors on file manipulations.

Non-DICOM encryption requires a secret-key to be provided at the global or profile level. More on this here.

Non-DICOM encryption can also have its random nonce pre-seeded. This is useful for making the output deterministic, but is cryptographically unsound. More on this here

DICOM

Encryption for DICOM profiles utilizes Cryptographic Message Syntax (CMS) encryption (See PS3.15 E.1.1) to store original tag values within the EncryptedAttributesSequence (See PS3.3 C.12.1.1.4.1) and then removes the tag from the non-encrypted portion of the DICOM dataset.

DICOM encryption can be symmetric or asymmetric. Symmetric encryption requires a secret-key to be provided at the global or profile level. Asymmetric encryption requires one or more public-key file paths and asymmetric-encryption set as true at the DICOM file profile level. More on asymmetric-encryption, public-key, and private-key.

- name: PatientName
  encrypt: true

decrypt

Decrypts the contents of the field as encrypted by the method described above.

Non-DICOM

Non-DICOM decryption requires a secret-key to be provided at the global or profile level. More on this here.

DICOM

Decryption for DICOM profiles decrypts the EncryptedAttributesSequence and restores values found within. If decrypt is specified for a tag that was not encrypted, the tag value will be maintained as-is.

DICOM decryption can be symmetric or asymmetric, according to how the DICOM was encrypted. Symmetric encryption requires a secret-key to be provided at the global or profile level. Asymmetric encryption requires a private-key file path and asymmetric-encryption set as true at the DICOM file profile level. More on asymmetric-encryption, public-key, and private-key.

- name: PatientName
  decrypt: true

regex-sub

Replaces the contents of the field with a value built from other attributes and/or group extracted from the field value. Below an example of such a field:

- name: SeriesDescription
  regex-sub:
    - input-regex: '(?P<current_sd>.*\/.*)'
      output: '{PatientID}_{current_sd}_{PulseSequenceName}'
      groups:
        - name: PatientID
          hash: True
        - name: current_sd
          keep: True
        - name: PulseSequenceName
          keep: True
    - ...
  • regex-sub takes a list of dict as value, each defining input-regex, output and groups.

  • input-regex defines the regular expression matching field value. Optionally it can extract group(s) to be used in the output.

  • output defines the python “f-string” to be formatted from the group captured and/or the record attributes.

  • groups defines the list of transformation fields to apply on group extracted or on record attributes. All variables in output must have an element groups associated to them. If no transformation is desired, the keep transformation must be used.

Note

The transformations defined under groups do NOT impact the metadata of the de-id record. The transformations are only made available to output.

Important

For all file profiles, the regex-sub fields are applied first (to avoid inadvertent and inconsistent transformation of group fields that have already been de-identified). And among all regex-sub fields, the last defined regex-sub field is applied first.

regex-sub leverage regular expression which comes handy for a lot of use cases. Here are a few other examples:

  • Masking day and month of PatientBirthDate DICOM tag

    - name: PatientBirthDate
      regex-sub:
        - input-regex: '(?P<year>\d{4}).*'
          output: '{year}0101'
          groups:
            - name: year
              keep: True
    
  • Capping PatientAge at 90Y

    - name: PatientAge
      regex-sub:
        - input-regex: '^(?P<age>(0*[9][0-9]Y)|([1-9]\d{2,}Y))$'
          output:'{age}'
          groups:
            - name: age
              replace-with: 090Y