Skip to content

import run

Import data from an external storage to a Flywheel project through a connector that's hosted and scaled within a cluster. Storages need to be registered by site-admins on the UI / Interfaces menu / External Storage tab or using fw-beta admin storage create in order to make them available for imports.

Usage

Rules

Selecting the files to be imported and configuring how they are stored in Flywheel can be defined with import rules. At least one rule is required for matching any file in the source storage. Additional rules may be specified to achieve complex import behaviors.

Each rule is tied to a Flywheel hierarchy level where the matching files will be imported and can optionally have a list of include and/or exclude filters. Currently only acquisition level file imports are supported.

Rules are evaluated in order and for every file, the first rule is going to be used where:

  • any on the include filters matches (if given) and
  • none of the exclude filters match (if given)

Files not matching any of the rules are going to be skipped.

Filters

Include and exclude filters are strings in the form <field><operator><value>.

Supported filter fields:

Field Type Description
path str File path (relative)
size int File size
ctime datetime Created timestamp
mtime datetime Modified timestamp

Supported filter operators depending on the value type:

Operator Description Types
=~ regex match str
!~ regex not match str
= equal str,int,float,datetime
!= not equal str,int,float,datetime
< less int,float,datetime
> greater int,float,datetime
<= less or equal int,float,datetime
>= greater or equal int,float,datetime

Mappings

Imports require metadata in order to place files correctly within a Flywheel project's subject/session/acquisition hierarchy. Mappings are strings in the form <template>=<pattern> that allow extracting information from the source file's fields (eg.: path) into one or more Flywheel metadata fields (eg.: session.label and acquisition.label).

The default mapping allows extracting all required metadata fields from each file's path, assuming that files are stored in a compatible folder hierarchy:

{path}={subject.label}/{session.label}/{acquisition.label}/*

If any of the required fields are missing after extracting with one or more mappings, the file will be marked as failed, but the import will continue to process the remaining data on storage.

Use --missing-meta skip to skip these files instead and in turn allow the overall import operation to complete without any failures for this reason.

Alternatively, use --fail-fast to halt the entire import when encountering an error.

Templates

Templates are similar to python f-strings for formatting metadata associated with a file as a single string. Currently only the path source field is available but it will be extended in the future.

Syntax Description
{field} Curly braces for referencing metadata fields
{field/pat/sub} re.sub pattern for substituting parts of the value
{field:format} f-string format spec (strftime for timestamps)
{field\|default} Default to use instead of "UNKNOWN" (for ""/None)

Combining modifiers is allowed in the order /pat/sub >> :format >> |default.

Patterns

Patterns are simplified python regexes tailored for scraping Flywheel metadata fields like acquisition.label from a string with capture groups.

Syntax Description
{field} Curly braces for capturing (dot-notated/nested) fields
[opt] Brackets for making parts of the match optional
* Star to match any string of characters (like glob)
. Dot to match a literal dot (like glob)

File Types

Running the import with the --type=<type> option allows setting the file.type metadata field in Flywheel to the specified value. Populating the type is useful for searching and for automatically running gears that are tied to data-types.

Import has additional features when importing DICOM data with --type=dicom:

  • files are parsed using pydicom
    (invalid DICOMs are treated as errors)
  • series are grouped by directory and SeriesInstanceUID
    (multiple series per directory are treated as errors)
  • series are uploaded az a single zipped file
    (except on single files, eg.: enhanced)
  • metadata fields have tag-based default mappings
  • custom --mappings can reference DICOM tags

Advanced

For more complex import workflows where files from multiple multiple levels are needed or the pattern mappings vary based on the data type for example, additional rules can be passed as inline YAML using the --rule option:

fw-beta import run ... --rule "include: [path=~csv], mapping: ['path={sub}/{ses}/{acq}/{file}']"

Defaults

Simple imports can usually be expressed with a single rule. The first import rule is defined by default and can be adjusted with command-line options directly:

Option Default
--include [] (include all files)
--exclude [] (don't exclude any file)
--mapping path={subject}/{session}/{acquisition}/{file}

DICOM

When using --type=dicom, import defaults the metadata fields based on DICOM tags. Default mappings (eg.: setting subject.label to the value of PatientID) are only applied if:

  • the field (subject.label) is not yet populated via a custom --mapping
  • the value (PatientID) is not empty

The default mappings for DICOM:

  • subject.label - PatientID
  • subject.firstname - split from PatientName
  • subject.lastname - split from PatientName
  • subject.sex - PatientSex
  • session.uid - StudyInstanceUID
  • session.label - StudyDescription
    fallback to session.timestamp
    fallback to StudyInstanceUID
  • session.age - from PatientAge (converted to seconds)
    fallback to delta between acquisition.timestamp and PatientBirthDate
  • session.weight - PatientWeight
  • session.operator - OperatorsName
  • session.timestamp - from StudyDate & StudyTime
    fallback to SeriesDate & SeriesTime
    fallback to AcquisitionDateTime
    fallback to AcquisitionDate & AcquisitionTime
    with respect to TimezoneOffsetFromUTC
  • acquisition.uid - SeriesInstanceUID
  • acquisition.label - SeriesNumber - SeriesDescription
    fallback to SeriesNumber - ProtocolName
    fallback to acquisition.timestamp (formatted as %Y-%m-%dT%H:%M:%S)
    fallback to SeriesInstanceUID
    only prefixed if SeriesNumber is set
  • acquisition.timestamp - AcquisitionDateTime
    fallback to AcquisitionDate & AcquisitionTime
    fallback to SeriesDate & SeriesTime
    fallback to StudyDate & StudyTime
    with respect to TimezoneOffsetFromUTC

Settings

Option Value Description
--overwrite auto Overwrite existing files if changed
never Do not overwrite existing files even if changed
always Overwrite existing files even if unchanged
--dry-run (flag) Run without actually uploading data (for testing)
--limit N Stop after processing N files (for testing)
--fail-fast N[%] Stop processing after reaching a failure threshold
--missing-meta fail Fail items with missing metadata
skip Skip items with missing metadata
--storage-config (YAML) Override default storage config

Some storage settings may be overridden when running an import, which is useful for getting data from the same bucket, but from a different prefix, for example:

fw-beta import run ... --storage-config "prefix: other/prefix"

Output options

The same output options are available as in fw-beta import get, the only difference being that run follows the progress of the import until it completes. Use the --no-wait option to exit immediately instead.

Pressing CTRL+C stops the progress monitoring in the CLI, but the import operation will continue running on the cluster. Use fw-beta import get to check the import's status or resume monitoring it's progress later.

Referencing files in place

Importing large datasets into Flywheel can incur substantial network costs while transferring and storage costs later for keeping a copy of the data.

Referencing files in place allows importing data into Flywheel without transferring any bytes from one cloud storage bucket to another, saving time, network and storage costs.

The ref-in-place workflow requires that the bucket is registered in Flywheel Core-API as a storage provider and that this provider is used for creating a storage via fw-beta admin storage create.