API
Below is a description of the flywheel-migration-toolkit API.
DeID Profile
Provides profile loading/saving for file de-identification.
- class flywheel_migration.deidentify.deid_profile.DeIdProfile
Bases:
object
Represents steps to take to de-identify a file or set of files.
- finalize()
Perform any necessary cleanup with profile.
- get_file_profile(name)
Get file profile for name, or None if not present.
- initialize()
Initialize the profile, prior to importing.
- load_config(config)
Initialize this profile from a config dictionary.
- matches_file(filename)
Determine from filename whether any of the file_profiles match on the filename :param filename: name of the file to match
- Returns:
True if a profile matches the filename, False if none match
- Return type:
bool
- process_file(src_fs, src_file, dst_fs)
Process the given file, if it’s handled by a file profile.
- Parameters:
src_fs – The source filesystem
src_file – The source file path
dst_fs – The destination filesystem
- Returns:
True if the file was processed, false otherwise
- Return type:
bool
- process_packfile(packfile_type, src_fs, dst_fs, paths, callback=None)
Process the given packfile, if it’s handled by a file profile.
- Parameters:
packfile_type (str) – The packfile type
src_fs – The source filesystem
dst_fs – The destination filesystem
paths – The list of paths to process
callback – Optional function to call after processing each file
- Returns:
True if the packfile was processed, false otherwise
- Return type:
bool
- to_config()
Create configuration dictionary from this profile.
- validate(enhanced=False)
Validate the profile, returning any errors.
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
DeID Field
Represents action to take in order to de-id a single field.
- class flywheel_migration.deidentify.deid_field.DeIdDecryptField(fieldname, is_regex=False, dry=False)
Bases:
DeIdField
Action to replace a field with its decrypted value (Undoes deid encryption).
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'decrypt'
- class flywheel_migration.deidentify.deid_field.DeIdEncryptField(fieldname, is_regex=False, dry=False)
Bases:
DeIdField
Action to replace a field with its symmetric-key encrypted value.
- force_nonce = ''
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'encrypt'
- load_config(config)
Load rule specific settings from configuration dictionary.
- class flywheel_migration.deidentify.deid_field.DeIdField(fieldname, is_regex=False, dry=False)
Bases:
object
Abstract class that represents action to take to de-identify a single field.
- deidentify(profile, state, record)
Perform the update - default implementation is to do a replace.
- classmethod factory(config, dry=False, mixin=None)
Create a new DeIdField instance for the given config.
- Parameters:
config (dict) – The field configuration
dry (bool) – Is set to true, set the field as dry, i.e. a field that does not modify the record.
mixin (DeIdFieldMixin) – Optional subclass of DeIdFieldMixin to be inherited by the DeIdField subclass to make the field profile specific.
- classmethod get_deidfield_class(config)
Returns DeIdField subclass matching config.
If only “name” is defined in config, returns DeIdKeepField, otherwise returns DeIdField subclass based on key action found in config.
- Parameters:
config (dict) – Dictionary e.g. {“name”: “PatientID”, “replace-with”: “TOTO”}
- Returns:
A DeIdField subclass or None if none is matching config
- Return type:
DeIdField or None
- abstract get_value(profile, state, record)
Get the transformed value, given profile state and record.
- property is_dry
- property is_regex
- key = None
- list_fieldname(record)
Return a list of fieldnames for record.
By default returns [self.fieldname]. Can be overwritten by certain subclasses of FieldEnhancerBaseMixin to returns a range of record attributes (e.g. when field uses regex, or range definition).
- load_config(config)
Load rule specific settings from configuration dictionary.
- local_to_config(config)
Convert rule specific settings to configuration dictionary.
- to_config()
Convert to configuration dictionary.
- class flywheel_migration.deidentify.deid_field.DeIdFieldMixin
Bases:
object
Mixin base class to add functionalities to DeIdField based on profile used.
- flavor = None
- class flywheel_migration.deidentify.deid_field.DeIdHashField(fieldname, is_regex=False, dry=False)
Bases:
DeIdField
Action to replace a field with its hashed value.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'hash'
- class flywheel_migration.deidentify.deid_field.DeIdHashUIDField(fieldname, is_regex=False, dry=False)
Bases:
DeIdField
Action to replace a uid field with its hashed value.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'hashuid'
- class flywheel_migration.deidentify.deid_field.DeIdIdentityField(*args, **kwargs)
Bases:
DeIdField
Action to do nothing on a field. Same as keep action. To be deprecated.
- deidentify(profile, state, record)
Do nothing.
Use in fieldname section, regex-sub and with remove-undefined action.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'identity'
- class flywheel_migration.deidentify.deid_field.DeIdIncrementDateField(fieldname, **kwargs)
Bases:
DeIdField
Action to replace a field with its incremented date.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'increment-date'
- load_config(config)
Load rule specific settings from configuration dictionary.
- local_to_config(config)
Convert rule specific settings to configuration dictionary.
- class flywheel_migration.deidentify.deid_field.DeIdIncrementDateTimeField(fieldname, **kwargs)
Bases:
DeIdField
Action to replace a field with its incremented date.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'increment-datetime'
- load_config(config)
Load rule specific settings from configuration dictionary.
- local_to_config(config)
Convert rule specific settings to configuration dictionary.
- class flywheel_migration.deidentify.deid_field.DeIdJitterField(fieldname, **kwargs)
Bases:
DeIdField
Action to jitter a field with some random number from a uniform distribution on a range.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'jitter'
- load_config(config)
Load rule specific settings from configuration dictionary.
- local_to_config(config)
Convert rule specific settings to configuration dictionary.
- class flywheel_migration.deidentify.deid_field.DeIdKeepField(fieldname, is_regex=False, dry=False)
Bases:
DeIdField
Action to do nothing on a field.
- deidentify(profile, state, record)
Do nothing.
Use in fieldname section, regex-sub and with remove-undefined action.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'keep'
- class flywheel_migration.deidentify.deid_field.DeIdRegexSubField(fieldname, **kwargs)
Bases:
DeIdField
Action to edit a string matching a regex with capture groups.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'regex-sub'
- load_config(config)
Load rule specific settings from configuration dictionary.
- local_to_config(config)
Convert rule specific settings to configuration dictionary.
- class flywheel_migration.deidentify.deid_field.DeIdRegexSubListItem(config)
Bases:
object
Class for representing a list item within DeIdRegexSubField.
- format_output(val_dict)
Format output according to output_map.
- get_invalid_output_vars()
Return a list of invalid output_vars.
- is_capture_group(var_name)
Return True if the varname matches a named capture group in self.input_regex.
- output_dot_replace_char = '___'
- regex_matches_field_value(value)
Return True if the value matches the regex, else False.
- to_config()
Convert to configuration dictionary.
- var_name_is_valid(var_name)
Return True if the varname is a capture group or is defined in self.group_dict, False otherwise.
- class flywheel_migration.deidentify.deid_field.DeIdRemoveField(fieldname, is_regex=False, dry=False)
Bases:
DeIdField
Action to remove a field from the record.
- deidentify(profile, state, record)
Perform the update - default implementation is to do a replace.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'remove'
- class flywheel_migration.deidentify.deid_field.DeIdReplaceField(fieldname, **kwargs)
Bases:
DeIdField
Action to replace a field from the record.
- get_value(profile, state, record)
Get the transformed value, given profile state and record.
- key = 'replace-with'
- load_config(config)
Load rule specific settings from configuration dictionary.
- local_to_config(config)
Convert rule specific settings to configuration dictionary.
File Profile
Individual file/packfile profile for de-identification.
- class flywheel_migration.deidentify.file_profile.FileProfile(packfile_type=None, file_filter=None)
Bases:
object
Abstract class that represents a single file/packfile profile.
- add_field(field)
Add a field to de-identify.
- add_log(log)
Set the log instance.
- alter_pixels(state, src_fs, path)
Alter pixels for given file.
Return None to do no preloading, return new tempfs to perform subsequent actions on tempfile.
- cleanup(state)
Perform any final cleaning up actions.
- create_file_state()
Create state object for processing files.
- date_format = '%Y%m%d'
- datetime_format = '%Y%m%d%H%M%S.%f'
- datetime_has_timezone = True
- default_filenames = []
- deidfield_mixin = None
- classmethod factory(name, config=None, log=None)
Create a new file profile instance for the given name.
- Parameters:
name (str) – The name of the profile type
config (dict) – The optional configuration dictionary
log – The optional de-id log instance
- file_signatures = [(None, None)]
- filename_field_prefix = '_fwmtk'
- get_dest_path(state, record, path)
Get destination path.
- get_log_entry(path, entry_type, state, record)
Returns a dictionary with key/value corresponding to log entry and the logged fields.
- get_log_fields()
Return the full set of fieldnames that should be logged.
- classmethod get_subclasses()
Returns all subclasses (not the immediate ones only).
- get_value(state, record, fieldname)
Get the transformed value for fieldname.
- has_field(var_fieldname)
Returns True if var_fieldname is defined in field_map or a regex field matches var_fieldname, else returns False.
- hash_algorithm = 'sha256'
- hash_digits = 0
- jitter_range = 2
- jitter_type = 'float'
- load_config(config)
Read configuration from a dictionary.
- abstract load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- log_fields = []
- matches_byte_sig(inp_fs, path)
Returns a boolean based on whether the file at path on inp_fs matches the file byte signature for the FileProfile subclass. :param inp_fs: the filesystem containing path :type inp_fs: fs.Base :param path: the path to the file to read on inp_fs
- Returns:
whether the file at path on inp_fs matches the file byte signature for the FileProfile
- Return type:
bool
- matches_file(filename)
Check if this profile can process the given file.
- matches_packfile(packfile_type)
Check if this profile can process the given packfile.
- name = None
- process_files(src_fs, dst_fs, files, callback=None)
Process all files in the file list, performing de-identification steps.
- Parameters:
src_fs – The source filesystem (Provides open function)
dst_fs – The destination filesystem
files – The set of files in src_fs to process
callback – Function to call after writing each file
- classmethod profile_names()
Get the list of profile names.
- abstract read_field(state, record, fieldname)
Read the named field as a string. Return None if field cannot be read.
- record = None
- regex_compatible = False
- abstract remove_field(state, record, fieldname)
Remove the named field from the record.
- abstract replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- replace_with_insert = True
- sanitize_filename = True
- abstract save_record(state, record, dst_fs, path)
Save the record to the destination path.
- set_filenames_attributes(record, path)
Update record object with private attributes based on filenames properties.
Record attributes are extended based on <groups> extracted from the <input-regex>. For instance the following filenames schema defines in profile:
filenames: - output: {group1}.ext input-regex=r'^(?P<group1>[\w]+).ext$' - output: {group1}-{group2}.ext input-regex=r'^(?P<group1>[\w]+)-(?P<date1>[\d]+).ext$'
will create attributes, depending on which input-regex matches, as:
# for `path` = test.ext record.<self.filename_field_prefix>_filename0_group1 = 'test' # for `path` = test-20200130.ext record.<self.filename_field_prefix>_filename1_group1 = 'test' # for `path` = test-20200130.ext record.<self.filename_field_prefix>_filename1_date1 = '20200130'
- Parameters:
record (object) – A record
path (str) – basename of input file
- set_log(log)
Set the log instance.
- static sort_fields(field_list)
Sort field_list such that regex-sub fields are first.
- to_config()
Get configuration as a dictionary.
- uid_default_prefix_fields = 4
- uid_default_suffix_fields = 1
- uid_hash_fields = (6, 6, 6, 6, 6, 6)
- uid_max_suffix_digits = 6
- validate(enhanced=False)
Validate the profile, returning any errors.
- Parameters:
enhanced (bool) – Performed a deeper validation if supported
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
- write_log_entry(path, entry_type, state, record)
Write a single log entry of type for path.
DICOM File Profile
File profile for de-identifying dicom files.
- class flywheel_migration.deidentify.dicom_file_profile.DicomDeIdFieldMixin
Bases:
DeIdFieldMixin
Mixin to add functionality to DeIdField for Dicom profile.
- deidentify(profile, state, record)
Deidentifies depending on field type.
- flavor = 'Dicom'
- list_fieldname(record)
Returns a list of fieldnames for record depending on field type.
- recurse_sequence = False
- class flywheel_migration.deidentify.dicom_file_profile.DicomFileProfile(file_filter=None)
Bases:
FileProfile
Dicom implementation of load/save and remove/replace fields.
- add_encrypted_field(dataelem)
Adds original value of modified field to ModifiedAttributesSequence for encryption.
- add_encrypted_modified_attributes(record)
Checks encryption type specified, adds EncryptedAttributesSequence to record.
- add_field(field)
Add a field to de-identify.
- alter_pixels(state, src_fs, path)
Alter pixels for given file.
Return None to do no preloading, return new tempfs to perform subsequent actions on tempfile.
- cleanup(state)
Remove deid profile for dicom cleaner.
- create_file_state()
Create state object for processing files.
- create_modified_attributes_sequence()
Create ModifiedAttributesSequence to store original values for encryption.
- decode = True
- default_file_filter = ['*.dcm', '*.DCM', '*.ima', '*.IMA']
- deidfield_mixin
alias of
DicomDeIdFieldMixin
- file_signatures = [(128, b'DICM')]
- get_data_element(record, fieldname)
Returns data element in record at fieldname.
- get_data_element_VR(record, fieldname)
Returns data element VR in record at fieldname.
- get_dest_path(state, record, path)
Returns default named based on SOPInstanceUID or one based on profile if defined.
- hash_digits = 16
- load_config(config)
Read configuration from a dictionary.
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- log_fields = ['StudyInstanceUID', 'SeriesInstanceUID', 'SOPInstanceUID']
- name = 'dicom'
- parse_pixel_actions()
- process_files(*args, **kwargs)
Process all files in the file list, performing de-identification steps.
- Parameters:
src_fs – The source filesystem (Provides open function)
dst_fs – The destination filesystem
files – The set of files in src_fs to process
callback – Function to call after writing each file
- read_field(state, record, fieldname)
Read the named field as a string. Return None if field cannot be read.
- recurse_sequence = False
- regex_compatible = True
- remove_field(state, record, fieldname)
Remove the named field from the record.
- remove_undefined = False
- remove_undefined_fields(state, record)
Remove data elements not defined in fields.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- to_config()
Get configuration as a dictionary.
- validate(enhanced=False)
Validate the profile, returning any errors.
- Parameters:
enhanced (bool) – If True, test profile execution on a set of test files
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
- validate_filenames(errors)
Validates the filename section of the profile.
- Parameters:
errors (list) – Current list of error message
- Returns:
Extended list of errors message
- Return type:
(list)
- class flywheel_migration.deidentify.dicom_file_profile.DicomTagStr(value, *_args, **_kwargs)
Bases:
str
Subclass of string that host attributes/methods to handle the different means field can reference Dicom data element(s).
- property dicom_tag
- property is_flat
Return True for ‘flat’ fieldname (map to a single tag), False otherwise.
- property is_private
- property is_repeater
- property is_sequence
- property is_wild_sequence
- parse_field_name(name)
Parse the field name and returns.
- Parameters:
name (str) – The field name.
- Returns:
Depending on name.
- Return type:
(list or Tag)
- Raises:
ValueError – if name matches multiple fieldname definition types.
- parsers_method_prefix = '_parse'
PNG File Profile
File profile for de-identifying files storing Exif metadata such as JPEG More on Exif at https://en.wikipedia.org/wiki/Exif.
- class flywheel_migration.deidentify.png_file_profile.ChunkStr(value, *_args, **_kwargs)
Bases:
str
Subclass of string with a few extra attributes related to PNG chunks.
- class flywheel_migration.deidentify.png_file_profile.PNGFileProfile(file_filter=None)
Bases:
FileProfile
PNG implementation of load/save and remove/replace fields.
- add_field(field)
Add field to profile.
- create_file_state()
Create state object for processing files.
- default_file_filter = ['*.png', '*.PNG']
- default_output_format = 'PNG'
- file_signatures = [(0, b'\x89PNG\r\n\x1a\n')]
- hash_digits = 16
- load_config(config)
Read configuration from a dictionary.
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- log_fields = []
- name = 'png'
- read_field(state, record, fieldname)
Read field from record.
- remove_field(state, record, fieldname)
Remove the named field from the record.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- to_config()
Get configuration as a dictionary.
- validate(enhanced=False)
Validate the profile, returning any errors.
- Parameters:
enhanced (bool) – If True, test profile execution on a set of test files
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
- class flywheel_migration.deidentify.png_file_profile.PNGRecord(fp, mode='r')
Bases:
object
A record for dealing with png file.
- property metadata
Load Exif metadata.
- mime_type = 'image/png'
- save_as(fp, file_type='PNG')
Save deid image.
- Parameters:
fp – A file object
file_type – Image format to save as
- validate()
Validate image against expecting type.
JPG File Profile
File profile for de-identifying files storing Exif metadata such as JPEG More on Exif at https://en.wikipedia.org/wiki/Exif.
- class flywheel_migration.deidentify.jpg_file_profile.ExifTagStr(value, *_args, **_kwargs)
Bases:
str
Subclass of string with a few extra attributes related to exif.
- class flywheel_migration.deidentify.jpg_file_profile.JPGFileProfile(file_filter=None)
Bases:
FileProfile
Exif implementation of load/save and remove/replace fields.
Human readable tags are leveraged from piexif.TAGS
- add_field(field)
Add field to profile.
Fields matching keyword found in multiple datablock (i.e. Exif, IFD0 and IFD1) get duplicated
- create_file_state()
Create state object for processing files.
- datetime_format = '%Y:%m:%d %H:%M:%S'
- default_file_filter = ['*.jpg', '*.jpeg', '*.JPG', '*.JPEG']
- default_output_format = 'JPEG'
- file_signatures = [(0, b'\xff\xd8\xff')]
- hash_digits = 16
- load_config(config)
Read configuration from a dictionary.
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- log_fields = []
- name = 'jpg'
- read_field(state, record, fieldname)
Read field from record.
- remove_field(state, record, fieldname)
Remove the named field from the record.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- to_config()
Get configuration as a dictionary.
- validate(enhanced=False)
Validate the profile, returning any errors.
- Parameters:
enhanced (bool) – If True, test profile execution on a set of test files
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
- class flywheel_migration.deidentify.jpg_file_profile.JPGRecord(fp, mode='r')
Bases:
object
A record for dealing with jpg file.
- file_type = 'JPEG'
- property metadata
Load Exif metadata.
- mime_type = 'image/jpeg'
- save_as(fp, file_type=None)
Save deid image.
- Parameters:
fp – A file object
file_type – Image format to save as
- validate()
Validate image against expecting type.
TIFF File Profile
File profile for de-identifying TIFF files.
- class flywheel_migration.deidentify.tiff_file_profile.IFDTagStr(value, *_args, **_kwargs)
Bases:
str
Subclass of string with a few extra attributes related to metadata.
- class flywheel_migration.deidentify.tiff_file_profile.TIFFFileProfile(file_filter=None)
Bases:
FileProfile
TIFF implementation of load/save and remove/replace fields.
Human readable tags are leveraged from PIL.TiffTags.TAGS_V2
- add_field(field)
Add field to profile.
- create_file_state()
Create state object for processing files.
- datetime_format = '%Y:%m:%d %H:%M:%S'
- default_file_filter = ['*.tif', '*.tiff', '*.TIF', '*.TIFF']
- default_output_format = 'TIFF'
- file_signatures = [(0, b'II*\x00'), (0, b'MM\x00*')]
- hash_digits = 16
- load_config(config)
Read configuration from a dictionary.
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- log_fields = []
- name = 'tiff'
- private_tags_lower_bound = 32768
- read_field(state, record, fieldname)
Read field from record.
- record_class
alias of
TIFFRecord
- remove_field(state, record, fieldname)
Remove the named field from the record.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- to_config()
Get configuration as a dictionary.
- validate(enhanced=False)
Validate the profile, returning any errors.
- Parameters:
enhanced (bool) – If True, test profile execution on a set of test files
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
- class flywheel_migration.deidentify.tiff_file_profile.TIFFRecord(fp, mode='r')
Bases:
object
A record for dealing with jpg file.
- file_type = 'TIFF'
- property metadata
Load metadata.
- mime_type = 'image/tiff'
- save_as(filepath, file_type=None, **kwargs)
Save deid image.
- Parameters:
filepath – A file path
file_type – Image format to save as
- validate()
Validate image against expecting type.
KEY/VALUE File Profile
File profile for de-identifying text files with lines that contain string pattern-delimited key-value pairs.
- class flywheel_migration.deidentify.key_value_text_file_profile.KeyValueTextFileLine(line, delimiter)
Bases:
object
Represents a parsed line from key-value text file.
- get_output_line()
Get the string representation of line with output_value.
- parse_line()
Parses self.input_line to determine delimiter_value, key, and input_value.
- set_value(value)
Sets self.output_value to value.
- class flywheel_migration.deidentify.key_value_text_file_profile.KeyValueTextFileProfile(file_filter=None)
Bases:
FileProfile
key-value text file implementation of load/save and remove/replace fields.
- default_file_filter = ['*.MHD', '*.mhd']
- hash_digits = 16
- load_config(config)
Read configuration from a dictionary.
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- name = 'key-value-text-file'
- read_field(state, record, fieldname)
Read the named field as a string. Return None if field cannot be read.
- remove_field(state, record, fieldname)
Remove the named field from the record.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- to_config()
Get configuration as a dictionary.
- validate(enhanced=False)
Validate the profile, returning any errors.
- Parameters:
enhanced (bool) – Performed a deeper validation if supported
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
- class flywheel_migration.deidentify.key_value_text_file_profile.KeyValueTextFileRecord(file_object, delimiter, ignore_bad_lines)
Bases:
object
Represents a text file where each line is a key-value pair delimited by delimiter.
- insert_key(key, value)
Prepares a new line object given a key and value and adds it to self.line_dict.
- parse_lines(file_object, ignore_bad_lines)
Parses the lines in file_object into self._line_dict.
- save_as(file_object)
Save text file.
- flywheel_migration.deidentify.key_value_text_file_profile.encoding_supported(enc)
Returns boolean indicating whether encoding string is supported.
JSON File Profile
File profile for de-identifying JSON/JSON file.
- class flywheel_migration.deidentify.json_file_profile.JSONFileProfile(file_filter=None)
Bases:
FileProfile
JSON implementation of load/save and remove/replace fields.
- add_field(field)
Add a field to de-identify.
- date_format = '%Y-%m-%d'
- datetime_format = '%Y-%m-%d %H:%M:%S'
- default_file_filter = ['*.json', '*.JSON']
- hash_digits = 16
- load_config(config)
Read configuration from a dictionary.
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- log_fields = []
- name = 'json'
- read_field(state, record, fieldname)
Read field from record.
- record_class
alias of
JSONRecord
- regex_compatible = True
- remove_field(state, record, fieldname)
Remove the named field from the record.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- separator = '.'
- class flywheel_migration.deidentify.json_file_profile.JSONRecord(fp, data=None, separator=None)
Bases:
object
A record for dealing with json file.
- default_separator = '.'
- file_type = 'JSON'
- classmethod from_dict(data, separator=None)
Instantiate record from a dictionary.
- get_all_dotty_paths()
Returns a list of string for all accessible path in record in dotty dict notation.
- items()
Iterate over key, value.
- keys()
List keys in data model.
- pop(key)
Pop element from data model.
- save_as(fp)
Save de-id as json.
- property separator
Returns separator used in Dotty.
- to_dict()
Export record as dictionary.
- values()
List value in data model.
XML File Profile
File profile for de-identifying XML files.
- class flywheel_migration.deidentify.xml_file_profile.XMLFileProfile(file_filter=None)
Bases:
FileProfile
Exif implementation of load/save and remove/replace fields.
- add_field(field)
Add field to profile.
- create_file_state()
Create state object for processing files.
- default_file_filter = ['*.XML', '*.xml']
- file_signatures = [(0, b'<?xml ')]
- hash_digits = 16
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- log_fields = []
- name = 'xml'
- read_field(state, record, fieldname)
Read field from record.
- remove_field(state, record, fieldname)
Remove the named field from the record.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- class flywheel_migration.deidentify.xml_file_profile.XMLRecord(fp)
Bases:
object
A record for dealing with XML file.
This is a dump class to allow for storing arbitrary attributes because lxml.etree._ElementTree does not allow for it (inheritance from a custom Class object that seems to prohibit it)
- save_as(fp)
Save xml tree.
- class flywheel_migration.deidentify.xml_file_profile.XPathStr(value, *_args, **_kwargs)
Bases:
str
Subclass of string with a few extra attributes related to xml.
- flywheel_migration.deidentify.xml_file_profile.parse_fieldname(name)
Parse the given string to determine if it’s XPath compatible.
- Params:
name (str): The XPath expression
- Returns:
XPathStr
Table File Profile
File profiles for de-identifying table-like file such as e.g. csv, tsv.
- class flywheel_migration.deidentify.table_file_profile.CSVFileProfile(file_filter=None)
Bases:
TableFileProfile
FileProfile class for CSV files.
- default_file_filter = ['.csv', '.CSV']
- delimiter = ','
- name = 'csv'
- reader = 'csv'
- class flywheel_migration.deidentify.table_file_profile.TSVFileProfile(file_filter=None)
Bases:
TableFileProfile
FileProfile class for TSV files.
- default_file_filter = ['.tsv', '.TSV']
- delimiter = '\t'
- name = 'tsv'
- reader = 'csv'
- class flywheel_migration.deidentify.table_file_profile.TableFileProfile(file_filter=None)
Bases:
FileProfile
FileProfile subclass for tables (e.g. csv, tsv) for de-id COLUMNS.
- add_field(field)
Add a field to de-identify.
- default_file_filter = None
- delimiter = None
- hash_digits = 16
- load_config(config)
Read configuration from a dictionary.
- load_record(state, src_fs, path)
Load the record(file) at path, return None to ignore this file.
- name = 'table'
- read_field(state, record, fieldname)
Read the named field as a string. Return None if field cannot be read.
- reader = None
- record_class
alias of
TableRecord
- remove_field(state, record, fieldname)
Remove the named field from the record.
- replace_field(state, record, fieldname, value)
Replace the named field with value in the record.
- save_record(state, record, dst_fs, path)
Save the record to the destination path.
- to_config()
Get configuration as a dictionary.
- validate(enhanced=False)
Validate the profile, returning any errors.
- Parameters:
enhanced (bool) – Performed a deeper validation if supported
- Returns:
A list of error messages, or an empty list
- Return type:
list(str)
Exceptions
Provides validation error for deid templates.