Title: Job Monitoring - Admin Version

Date: July 9th 2020

Description:
A simple tutorial about job monitoring for Admin only.

Topics that are included:

  1. Jobs that I have launched
  2. Filter jobs based on gear name, date range, and state
  3. Cancelling Jobs
  4. Restarting Jobs
  5. Get summary of job status

Capture information about jobs: Execution time, queue time, by job, sorting, plots with information about the job id on hover

Requirements:

  1. Access to a Flywheel instance.
  2. A Flywheel API key.
  3. A Flywheel Project with ideally the dataset used in the upload-data notebook.
  4. Site Admin Permission
  5. Have some jobs running in your Flywheel Project

NOTE: This notebook is using a test dataset provided by the upload-data notebook. If you have not uploaded this test dataset yet, we strongly recommend you do so now following steps in here before proceeding because this notebook is based on a specific project structure.

WARNING: The metadata of the acquisitions in your test project will be updated and new files will be created after running the scripts below.

Install and Import Dependencies

Flywheel API Key and Client

Get a API_KEY. More on this at in the Flywheel SDK doc here.

Instantiate the Flywheel API client

Show Flywheel logging information


Check User Minimum Requirements

Before we started our section, we would like to verify that you have the right permission to proceed in this notebook.


Find Jobs

Firstly, we will show you how to find the jobs that you have run previously.

In the example below, we will be getting 2 jobs that you have launched within your instance. You can change the number of jobs that will be returned by modified the limit variable.

Info:To learn more about the different attributes, please visit our SDK Docs here. It will come in handy when you try to filter jobs.

Essentially, you can search for the jobs that launched by other users as well.


Filter jobs based on gear name, date range, and state

Gear Name

Date Range

State


Cancel Jobs

Simply use the update method to cancel the job that is on pending.


Restart Jobs

You can also restart a job that has a state of failed. However, each job can only be retried once.

To demonstrate, we will be restarting mriqc job that has failed by iterating through the user_jobs list that we defined earlier with fw.jobs.find() method . We will be using exception handling to prevent from restarting job for more than one times.

Once the job has been successfully restarted, it will return a new job_id. We will append this new job_id to a list named retried_job.


Job Statistics

In this section, we will present an example of calculating, plotting and then using job statistics for the purpose of cancelling jobs that take too long.

To give you an overview, you can use fw.get_jobs_stats() method to view the status of all current jobs within the Flywheel Instance.

Before getting started, we will be defining a few values like the gear name, date of the jobs created and sample size etc.

Initialize a few values

Helpful Function