Field Transformations

Warning

identity will be deprecated in version 14.0. Use keep action instead.

Field transformation defines the action to be taken on that field.

An example of a field transformation definition is:

- name: PatientID
  remove: true

name is a valid key for all type of fields and de-identification profiles. Value for name is profile dependent however. Please refer to the de-identification File Profile page.

In addition, certain profiles support the use of regex key in place of name. Please refer to this section to learn more about its use.

The following field transformations are supported:

remove

Removes the field from the file entirely. If removal is not supported then this will blank the field.

- name: StationName
  remove: true

replace-with

Replaces the contents of the field with the value provided. Please be aware of the the length of the field being replaced. (e.g. some DICOM fields only support a limited number of characters).

- name: PatientID
  replace-with: REDACTED

By default, the field will be created in the record if it does not exist. This behavior can be reversed by setting the boolean option replace-with-insert to False on the profile or the field. By default, replace-with-insert is defined at the file profile level (more on this here). Optionally, the field can define replace-with-insert as in the below example and take precedence:

- name: PatientID
  replace-with: REDACTED
  replace-with-insert: False

Important

If replace-with-insert is True and the field is not present in the record metadata, the field will be tentatively created. If a non-flat field is used instead (e.g. regex), then replace-with will not attempt to create any field.

increment-date

Offsets the date by the number of days defined in the date-increment setting of the file profile.

- name: StudyDate
  increment-date: true

By default, the date format used is the date-format defined at the file profile level (more on this here). Optionally, this field can also use an ad-hoc date format as in the below example:

- name: StudyDate
  increment-date: true
  date-format: "%Y-%m-%d"

Important

The user is responsible for setting a date-format which is valid for the file type being processed. The date-format is used to parse the date from the input file.

Note

Pass date-format as timestamp to input/output in unix timestamp format (float)

Additional configuration options are as follows:

  • jitter-date: (bool) Will perform a jitter based on the jitter-range and jitter-unit in addition to the normal increment

  • jitter-range: (int) Range to select random jitter from

  • jitter-unit: (str) Unit to jitter, select from [seconds, minutes, hours, days, weeks]

  • date-increment-override: (int) Override parent profile date increment

increment-datetime

Offsets the date by the number of days defined in the date-increment setting of the file profile, preserving the time and timezone.

- name: AcquisitionDateTime
  increment-datetime: true

By default, the datetime format used is the datetime-format defined at the file profile level (more on this here). Optionally, this field can also use an ad-hoc datetime format as in the below example:

- name: StudyDate
  increment-date: true
  datetime-format: "%Y-%m-%d %H:%M:%S"

Important

The user is responsible for setting a datetime-format which is valid for the file type being processed. The datetime-format is used to parse the datetime from the input file.

Note

Pass datetime-format as timestamp to input/output in unix timestamp format (float)

Additional configuration options are as follows:

  • jitter-date: (bool) Will perform a jitter based on the jitter-range and jitter-unit in addition to the normal increment

  • jitter-range: (int) Range to select random jitter from

  • jitter-unit: (str) Unit to jitter, select from [seconds, minutes, hours, days, weeks]

  • date-increment-override: (int) Override parent profile date increment

hash

Replace the contents of the field with a one-way cryptographic hash, in hexadecimal form. Only the first 16 characters of the hash will be used, in order to support short strings.

- name: AccessionNumber
  hash: true

hashuid

Replaces a UID field with a hashed version of that field. By default, the first four nodes (prefix) and last node (suffix) will be preserved, with the middle being replaced by the hashed value. For example: “1.2.840.113619.6.283.4.983142589.7316.1300473420.841” becomes “1.2.840.113619.551726.420312.177022.222461.230571.501817.841”

This field properties can be configured on the profile. More on this here.

- name: ConcatenationUID
  hashuid: true

Note

If the hashuid config leads to a string that have more that 64 characters, the value is truncated with the prefix and suffix preserved.

jitter

Offset the value by a random value. By default, the random value is drawn from a uniform distribution centered on 0 and in the range [-1, 1]. The range is controlled by the jitter-range property. The action can be applied on integers or floats.

This field properties can be configured on the profile. More on this here.

- name: PatientWeight
  jitter: true

The profile jitter-range and jitter-type configuration can be overwritten by specifying it at the field level as well.

- name: PatientWeight
  jitter: true
  jitter-range: 10
  jitter-type: int

keep

Do nothing. Used for instance in the regex-sub action groups sections.

- name: SeriesDescription
  keep: true

Note

If only name is defined as key in the field configuration, it will default to the keep action.

identity

Do nothing. Used for instance in the regex-sub action groups sections.

- name: SeriesDescription
  identity: true

Warning

identity will be deprecated in version 14. Use keep action instead.

regex-sub

Replaces the contents of the field with a value built from other attributes and/or group extracted from the field value. Below an example of such a field:

- name: SeriesDescription
  regex-sub:
    - input-regex: '(?P<current_sd>.*\/.*)'
      output: '{PatientID}_{current_sd}_{PulseSequenceName}'
      groups:
        - name: PatientID
          hash: True
        - name: current_sd
          keep: True
        - name: PulseSequenceName
          keep: True
    - ...
  • regex-sub takes a list of dict as value, each defining input-regex, output and groups.

  • input-regex defines the regular expression matching field value. Optionally it can extract group(s) to be used in the output.

  • output defines the python “f-string” to be formatted from the group captured and/or the record attributes.

  • groups defines the list of transformation fields to apply on group extracted or on record attributes. All variables in output must have an element groups associated to them. If no transformation is desired, the keep transformation must be used.

Note

The transformations defined under groups do NOT impact the metadata of the de-id record. The transformations are only made available to output.

Important

For all file profiles, the regex-sub fields are applied first (to avoid inadvertent and inconsistent transformation of group fields that have already been de-identified). And among all regex-sub fields, the last defined regex-sub field is applied first.

regex-sub leverage regular expression which comes handy for a lot of use cases. Here are a few other examples:

  • Masking day and month of PatientBirthDate DICOM tag

    - name: PatientBirthDate
      regex-sub:
        - input-regex: '(?P<year>\d{4}).*'
          output: '{year}0101'
          groups:
            - name: year
              keep: True
    
  • Capping PatientAge at 90Y

    - name: PatientAge
      regex-sub:
        - input-regex: '^(?P<age>(0*[9][0-9]Y)|([1-9]\d{2,}Y))$'
          output:'{age}'
          groups:
            - name: age
              replace-with: 090Y