RED Format

A Reproducible Experiment Description (RED) contains all details about a data-driven experiment in YAML or JSON format. To process RED files or execute experiments use the CC-FAICE commandline tools.

The following listings show two possible YAML structures.

Option 1:

redVersion: ...
cli: ...
container: ...
inputs: ...
outputs: ...    # optional for faice
execution: ...  # optional for faice, ccagency

Option 2:

redVersion: ...
cli: ...
container: ...
batches: ...
execution: ...  # optional for faice, ccagency

If you want to know the exact jsonschema for RED supported by Curious Containers, you can install CC-FAICE and use its faice schema subcommands as follows.

pip3 install --user cc-faice
faice --version
faice schema list
faice schema show red

Read through the following tutorial sections to learn more about each part of a RED file.

redVersion

The redVersion increases everytime the RED format changes. This means that a RED file in version "3" should be used with Curious Containers software packages in version 3.x.x, higher software versions will not work. See Versioning for more details.

cli

Curious Containers only works with applications providing a proper commandline interface (CLI). This CLI, with its positional and optional arguments, must be described in the Command Workflow Language (CWL) commandline specification syntax. Other CWL compatible tools require separate .cwl files, but a RED file has the CWL description embedded under the cli keyword.

The following listing shows the main keywords of an embedded CWL description.

cli:
  cwlVersion: "v1.0"        # fixed
  class: "CommandLineTool"  # fixed
  baseCommand: ...
  doc: ...                  # optional
  inputs: ...
  outputs: ...

The baseCommand is an executable program, which is located in a PATH (environment variable) directory. Usually its a string like "command", where command is the name of the program.

cli:
  baseCommand: "command"
  ...

If you want to call a subcommand of a program, like command subcommand, you can specify a list.

cli:
  baseCommand:
    - "command"
    - "subcommand"

Possible commandline arguments are defined under cli.inputs.

The arguments can be of type string, int, long, float, double, boolean, File or Directory, although for technical reasons CC does not distinguish between int and long or float and double. File and Directory must be valid paths, either absolute or relative. If an argument is optional, it is marked with ?, like File?. If an argument can be repeated an arbitrary number of times, this can be indicated by list symbol, like File[].

Take the following program call as an example:

 command --optional-flag --optional-number=42 /required/file /required/dir

In this case the call uses two optional arguments, one is a boolean flag, the other takes an integer number. Both /required/file and /required/dir are mandatory positional arguments, at positions 0 and 1 respectively. The CWL syntax allows us to describe the CLI capabilities of the program without specifying concrete inputs. As can be seen in the listing below, the specific values like 42 or /required/file are not included in the CWL description. Please note, that we need to choose an arbitrary identifier, like some_file, for each of the arguments.

cli:
  inputs:
    some_flag:
      type: "boolean?"
      inputBinding:
        prefix: "--optional-flag"
    some_number:
      type: "int?"
      inputBinding:
        prefix: "--optional-number="
        separate: False
    some_file:
      type: "File"
      inputBinding:
        position: 0
    some_dir:
      type: "Directory"
      inputBinding:
        position: 1

Running a command should result in one or more files being written to the filesystem. The corresponding file paths are defined under cli.outputs. Only outputs of type File are allowed.

We assume that the command produces a CSV table and a PDF plot file. The table file is called table.csv and is optional. The plot file is not optional, but its full name is not known beforehand, so we can use the glob pattern *.pdf. In the context of CC, any glob pattern must match exactly one file in the output directory.

cli:
  outputs:
    some_table:
      type: "File?"
      outputBinding:
        glob: "table.csv"
    some_plot:
      type: "File"
      outputBinding:
        glob: "*.pdf"

inputs

To run an experiment, concrete inputs for the CLI program have to be provided. This is done under the inputs keyword. Inputs can be primitive types like int or boolean or remote Files and Directories. Primitive types are just included in the RED file. As can be seen in the listing below, the identifiers in inputs refer to the arbitrary identifiers in cli.inputs (e.g. inputs.some_flag refers to cli.inputs.some_flag).

cli:
  inputs:
    some_flag: ...
    some_number: ...
    some_file: ...
    some_dir: ...

inputs:
  some_flag: True
  some_number: 42
  some_file: ...
  some_dir: ...

To make files and directories available as input for an experiment, you have to use RED connectors, that support a variety of protocols like SSH, HTTP. If the available connectors do not suit you, you can implement your own following the RED Connector CLI 1 specification. Input files are downloaded automatically into a running container. The download paths are provided to the programm command as CLI arguments (see section cli).

The command keyword refers to name of the connector executable that is made available to system via the PATH environment variable. The information provided under access depends on the connector. Refer to the documentation of a specific connector for more information on how to specify access data correctly.

The following example uses the HTTP connector to download a single input file.

inputs:
  some_file:
    class: "File"
    connector:
      command: "red-connector-http"
      access:
        url: "https://raw.githubusercontent.com/curious-containers/red-guide-vagrant/master/in.txt"

inputs: directories

In order to download an entire directory, some connectors like the HTTP connector require a directory listing. This listing defines the subfiles and subdirectories and is only allowed for directory connectors. Even for connectors which do not strictly require a listing it is recommended to include one, because it will be used to automatically check the directory contents for missing files and subdirectories.

inputs:
  some_dir:
    class: "Directory"
    connector:
      command: "red-connector-http"
      access:
        url: "https://raw.githubusercontent.com/curious-containers/cc-core/master/cc_core/"
    listing:
      - class: "File"
        basename: "version.py"
      - class: "Directory"
        basename: "agent"
        listing:
          - class: "File"
            basename: "__main__.py"

Please note, that not every connector provides functionality for input files and directories, but the HTTP and SSH connectors can be used in both cases.

outputs

Outputs of an experiment can then be uploaded to remote servers using various connectors.

Again, the output identifiers in the outputs section refer to the arbitrary identifiers defined under cli.outputs.

cli:
  outputs:
    some_table: ...
    some_plot: ...

outputs:
  some_table: ...
  some_plot: ...

If you are not interested in some of the outputs, you are not required to specify a connector for them. In this case, we are only interested in some_table and we specify an SSH connector.

outputs:
  some_table:
    class: "File"
    connector:
      command: "red-connector-ssh"
      access:
        host: "example.com"
        port: 22
        auth:
          username: "username"
          password: "password"
        filePath: "/home/username/files/out.txt"

batches

With the batches keyword multiple inputs and outputs can be specified as a list. Each batch is processed independently in its own Docker container.

When the batches keyword is used, the inputs and outputs keywords cannot appear in the top level section of the RED file, as shown below.

redVersion: ...
cli: ...
batches:
  - inputs: ...
    outputs: ...
  - inputs: ...
    outputs: ...
container: ...
execution: ...

container

RED provides are generic way to include settings for container engines, such that CC or other tools can implement different engines. Curious Containers currently only supports Docker as a RED Container Engine.

Under the container keyword, you have to provide the engine name and the settings for the chosen engine.

container:
  engine: "docker"
  settings: ...

execution

Under the execution keyword you can specify an execution engine, which is capable of processing the given RED file. For example the URL and access information to a CC-Agency server can be given here. For supported execution engines take a look at the RED Execution Engines documentation.

execution:
  engine: "ccagency"
  settings: ...

If you have specified an execution engine you can use the faice exec CLI tool to execute the experiment using the given engine.