Field Transformations
Warning
identity
will be deprecated in version 14.0. Use keep
action instead.
Field transformation defines the action to be taken on that field.
An example of a field transformation definition is:
- name: PatientID
remove: true
name
is a valid key for all type of fields and de-identification profiles.
Value for name
is profile dependent however. Please refer to the
de-identification File Profile page.
In addition, certain profiles support the use of regex
key in place of name
.
Please refer to this section to learn more about its use.
The following field transformations are supported:
remove
Removes the field from the file entirely. If removal is not supported then this will blank the field.
- name: StationName
remove: true
replace-with
Replaces the contents of the field with the value provided. Please be aware of the the length of the field being replaced. (e.g. some DICOM fields only support a limited number of characters).
- name: PatientID
replace-with: REDACTED
By default, the field will be created in the record if it does not exist. This behavior
can be reversed by setting the boolean option replace-with-insert
to False
on the profile or the field. By default, replace-with-insert
is defined at the file
profile level (more on this here). Optionally, the field
can define replace-with-insert
as in the below example and take precedence:
- name: PatientID
replace-with: REDACTED
replace-with-insert: False
Important
If replace-with-insert
is True
and the field is not present in the record metadata,
the field will be tentatively created. If a non-flat field is used instead
(e.g. regex
), then replace-with
will not attempt to create any field.
increment-date
Offsets the date by the number of days defined in the date-increment setting of the file profile.
- name: StudyDate
increment-date: true
By default, the date format used is the date-format
defined at the file profile level
(more on this here). Optionally, this field can also use an ad-hoc
date format as in the below example:
- name: StudyDate
increment-date: true
date-format: "%Y-%m-%d"
Important
The user is responsible for setting a date-format which is valid for the file type being processed. The date-format is used to parse the date from the input file.
Note
Pass date-format
as timestamp
to input/output in unix timestamp format (float)
Additional configuration options are as follows:
jitter-date
: (bool) Will perform a jitter based on thejitter-range
andjitter-unit
in addition to the normal incrementjitter-range
: (int) Range to select random jitter fromjitter-unit
: (str) Unit to jitter, select from[seconds, minutes, hours, days, weeks]
date-increment-override
: (int) Override parent profile date incrementdatetime-min
: (str) Enforces a minimum date/datetime, in the formatyyyymmdd
for a specific date or[+-]<amount><unit>
(e.g-0years
for current date,-80years
for 80 years before today) for a calculated date. Select valid unit type from[years, weeks, days]
.datetime-max
: (str) Enforces a maximum date/datetime, in the formatyyyymmdd
for a specific date or[+-]<amount><unit>
for a calculated date.
increment-datetime
Offsets the date by the number of days defined in the date-increment setting of the file profile, preserving the time and timezone.
- name: AcquisitionDateTime
increment-datetime: true
By default, the datetime format used is the datetime-format
defined at the file
profile level (more on this here). Optionally, this field can
also use an ad-hoc datetime format as in the below example:
- name: StudyDate
increment-date: true
datetime-format: "%Y-%m-%d %H:%M:%S"
Important
The user is responsible for setting a datetime-format which is valid for the file type being processed. The datetime-format is used to parse the datetime from the input file.
Note
Pass datetime-format
as timestamp
to input/output in unix timestamp format (float)
Additional configuration options are as follows:
jitter-date
: (bool) Will perform a jitter based on thejitter-range
andjitter-unit
in addition to the normal incrementjitter-range
: (int) Range to select random jitter fromjitter-unit
: (str) Unit to jitter, select from[seconds, minutes, hours, days, weeks]
date-increment-override
: (int) Override parent profile date incrementdatetime-min
: (str) Enforces a minimum date/datetime, in the formatyyyymmdd
for a specific date or[+-]<amount><unit>
(e.g-0years
for current date,-80years
for 80 years before today) for a calculated date. Select valid unit type from[years, weeks, days]
.datetime-max
: (str) Enforces a maximum date/datetime, in the formatyyyymmdd
for a specific date or[+-]<amount><unit>
for a calculated date.
hash
Replace the contents of the field with a one-way cryptographic hash, in hexadecimal form. Only the first 16 characters of the hash will be used, in order to support short strings.
- name: AccessionNumber
hash: true
hashuid
Replaces a UID field with a hashed version of that field. By default, the first four nodes (prefix) and last node (suffix) will be preserved, with the middle being replaced by the hashed value. For example: “1.2.840.113619.6.283.4.983142589.7316.1300473420.841” becomes “1.2.840.113619.551726.420312.177022.222461.230571.501817.841”
This field properties can be configured on the profile. More on this here.
- name: ConcatenationUID
hashuid: true
Note
If the hashuid
config leads to a string that have more that 64 characters,
the value is truncated with the prefix and suffix preserved.
jitter
Offset the value by a random value. By default, the random value is drawn from a uniform
distribution centered on 0 and in the range [-1, 1]. The range is controlled by the
jitter-range
property. The action can be applied on integers or floats.
This field properties can be configured on the profile. More on this here.
- name: PatientWeight
jitter: true
The profile jitter-range
and jitter-type
configuration can be overwritten by
specifying it at the field level as well. Configuration options jitter-min
and
jitter-max
can be specified to enforce minimum and maximum values, respectively.
- name: PatientWeight
jitter: true
jitter-range: 10
jitter-type: int
jitter-min: 25
jitter-max: 250
keep
Do nothing. Used for instance in the regex-sub
action groups
sections.
- name: SeriesDescription
keep: true
Note
If only name
is defined as key in the field configuration, it will default
to the keep
action.
identity
Do nothing. Used for instance in the regex-sub
action groups
sections.
- name: SeriesDescription
identity: true
Warning
identity
will be deprecated in version 14. Use keep
action instead.
encrypt
Encrypts the contents of the field as specified by the profile and profile type.
Non-DICOM
Encryption for non-DICOM profiles uses standard AES-EAX which provides confidentiality and authenticity to encrypt the contents of the field in place. (See EAX mode)
Non-DICOM encryption uses a nonce length of 8 bytes, which is randomly generated at time of encryption. With AES-EAX and an 8-byte nonce length, source fields must be less than or equal to 38 characters to maintain Flywheel’s 64-length character limit. Ensuring this is up to the user. If character limit is exceeded, later decryption may not be successful and/or may cause Flywheel errors on file manipulations.
Non-DICOM encryption requires a secret-key
to be provided at the global or profile level.
More on this here.
Non-DICOM encryption can also have its random nonce pre-seeded. This is useful for making the output deterministic, but is cryptographically unsound. More on this here
DICOM
Encryption for DICOM profiles utilizes Cryptographic Message Syntax (CMS) encryption (See PS3.15 E.1.1) to store original tag values within the EncryptedAttributesSequence (See PS3.3 C.12.1.1.4.1) and then removes the tag from the non-encrypted portion of the DICOM dataset.
DICOM encryption can be symmetric or asymmetric. Symmetric encryption requires a
secret-key
to be provided at the global or profile level. Asymmetric encryption
requires one or more public-key
file paths and asymmetric-encryption
set as
true
at the DICOM file profile level.
More on asymmetric-encryption, public-key,
and private-key.
- name: PatientName
encrypt: true
decrypt
Decrypts the contents of the field as encrypted by the method described above.
Non-DICOM
Non-DICOM decryption requires a secret-key
to be provided at the global or profile level.
More on this here.
DICOM
Decryption for DICOM profiles decrypts the EncryptedAttributesSequence and restores values
found within. If decrypt
is specified for a tag that was not encrypted, the tag value will
be maintained as-is.
DICOM decryption can be symmetric or asymmetric, according to how the DICOM was encrypted.
Symmetric encryption requires a secret-key
to be provided at the global or profile level.
Asymmetric encryption requires a private-key
file path and asymmetric-encryption
set as
true
at the DICOM file profile level.
More on asymmetric-encryption, public-key,
and private-key.
- name: PatientName
decrypt: true
regex-sub
Replaces the contents of the field with a value built from other attributes and/or group extracted from the field value. Below an example of such a field:
- name: SeriesDescription
regex-sub:
- input-regex: '(?P<current_sd>.*\/.*)'
output: '{PatientID}_{current_sd}_{PulseSequenceName}'
groups:
- name: PatientID
hash: True
- name: current_sd
keep: True
- name: PulseSequenceName
keep: True
- ...
regex-sub
takes a list of dict as value, each defininginput-regex
,output
andgroups
.input-regex
defines the regular expression matching field value. Optionally it can extract group(s) to be used in theoutput
.output
defines the python “f-string” to be formatted from the group captured and/or the record attributes.groups
defines the list of transformation fields to apply on group extracted or on record attributes. All variables inoutput
must have an element groups associated to them. If no transformation is desired, thekeep
transformation must be used.
Note
The transformations defined under groups
do NOT impact the metadata of
the de-id record. The transformations are only made available to output
.
Important
For all file profiles, the regex-sub
fields are applied first (to avoid
inadvertent and inconsistent transformation of group fields that have already
been de-identified). And among all regex-sub
fields, the last defined
regex-sub
field is applied first.
regex-sub
leverage regular expression which comes handy for a lot of use cases.
Here are a few other examples:
Masking day and month of PatientBirthDate DICOM tag
- name: PatientBirthDate regex-sub: - input-regex: '(?P<year>\d{4}).*' output: '{year}0101' groups: - name: year keep: True
Capping PatientAge at 90Y
- name: PatientAge regex-sub: - input-regex: '^(?P<age>(0*[9][0-9]Y)|([1-9]\d{2,}Y))$' output:'{age}' groups: - name: age replace-with: 090Y