A Reproducible Experiment Description (RED) contains all details about a data-driven experiment in YAML or JSON format. To process RED files or execute experiments use the CC-FAICE commandline tools.
The following listings show two possible YAML structures.
redVersion: ... cli: ... container: ... inputs: ... outputs: ... # optional for faice execution: ... # optional for faice, ccagency
redVersion: ... cli: ... container: ... batches: ... execution: ... # optional for faice, ccagency
If you want to know the exact jsonschema for RED supported by Curious Containers, you can install CC-FAICE and use its
faice schema subcommands as follows.
pip3 install --user cc-faice faice --version faice schema list faice schema show red
Read through the following tutorial sections to learn more about each part of a RED file.
redVersion increases everytime the RED format changes. This means that a RED file in version
"3" should be used with Curious Containers software packages in version
3.x.x, higher software versions will not work. See Versioning for more details.
Curious Containers only works with applications providing a proper commandline interface (CLI). This CLI, with its positional and optional arguments, must be described in the Command Workflow Language (CWL) commandline specification syntax. Other CWL compatible tools require separate
.cwl files, but a RED file has the CWL description embedded under the
The following listing shows the main keywords of an embedded CWL description.
cli: cwlVersion: "v1.0" # fixed class: "CommandLineTool" # fixed baseCommand: ... doc: ... # optional inputs: ... outputs: ...
baseCommand is an executable program, which is located in a
PATH (environment variable) directory. Usually its a string like
command is the name of the program.
cli: baseCommand: "command" ...
If you want to call a subcommand of a program, like
command subcommand, you can specify a list.
cli: baseCommand: - "command" - "subcommand"
Possible commandline arguments are defined under
The arguments can be of type
Directory, although for technical reasons CC does not distinguish between
Directory must be valid paths, either absolute or relative. If an argument is optional, it is marked with
File?. If an argument can be repeated an arbitrary number of times, this can be indicated by list symbol, like
Take the following program call as an example:
command --optional-flag --optional-number=42 /required/file /required/dir
In this case the call uses two optional arguments, one is a boolean flag, the other takes an integer number. Both
/required/dir are mandatory positional arguments, at positions
1 respectively. The CWL syntax allows us to describe the CLI capabilities of the program without specifying concrete inputs. As can be seen in the listing below, the specific values like
/required/file are not included in the CWL description. Please note, that we need to choose an arbitrary identifier, like
some_file, for each of the arguments.
cli: inputs: some_flag: type: "boolean?" inputBinding: prefix: "--optional-flag" some_number: type: "int?" inputBinding: prefix: "--optional-number=" separate: False some_file: type: "File" inputBinding: position: 0 some_dir: type: "Directory" inputBinding: position: 1
Running a command should result in one or more files being written to the filesystem. The corresponding file paths are defined under
cli.outputs. Only outputs of type
File are allowed.
We assume that the command produces a CSV table and a PDF plot file. The table file is called
table.csv and is optional. The plot file is not optional, but its full name is not known beforehand, so we can use the glob pattern
cli: outputs: some_table: type: "File?" outputBinding: glob: "table.csv" some_plot: type: "File" outputBinding: glob: "*.pdf"
To run an experiment, concrete inputs for the CLI program have to be provided.
This is done under the
Inputs can be primitive types like
boolean or remote Files and Directories.
Primitive types are just included in the RED file.
As can be seen in the listing below, the identifiers in
inputs refer to the arbitrary identifiers in
inputs.some_flag refers to
cli: inputs: some_flag: ... some_number: ... some_file: ... some_dir: ... inputs: some_flag: True some_number: 42 some_file: ... some_dir: ...
To make files and directories available as input for an experiment, you have to use RED connectors, that support a variety of protocols like SSH, HTTP. If the available connectors do not suit you, you can implement your own following the RED Connector CLI 1 specification. Input files are downloaded automatically into a running container. The download paths are provided to the programm command as CLI arguments (see section cli).
command keyword refers to name of the connector executable that is made available to system via the
PATH environment variable.
The information provided under
access depends on the connector. Refer to the documentation of a specific connector for more information on how to specify
access data correctly.
The following example uses the HTTP connector to download a single input file.
inputs: some_file: class: "File" connector: command: "red-connector-http" access: url: "https://raw.githubusercontent.com/curious-containers/red-guide-vagrant/master/in.txt"
In order to download an entire directory, some connectors like the HTTP connector require a directory
listing. This listing defines the subfiles and subdirectories and is only allowed for directory connectors. Even for connectors which do not strictly require a listing it is recommended to include one, because it will be used to automatically check the directory contents for missing files and subdirectories.
inputs: some_dir: class: "Directory" connector: command: "red-connector-http" access: url: "https://raw.githubusercontent.com/curious-containers/cc-core/master/cc_core/" listing: - class: "File" basename: "version.py" - class: "Directory" basename: "agent" listing: - class: "File" basename: "__main__.py"
Please note, that not every connector provides functionality for input files and directories, but the HTTP and SSH connectors can be used in both cases.
Outputs of an experiment can then be uploaded to remote servers using various connectors.
Again, the output identifiers in the
outputs section refer to the arbitrary identifiers defined under
cli: outputs: some_table: ... some_plot: ... outputs: some_table: ... some_plot: ...
If you are not interested in some of the outputs, you are not required to specify a connector for them. In this case, we are only interested in
some_table and we specify an SSH connector.
outputs: some_table: class: "File" connector: command: "red-connector-ssh" access: host: "example.com" port: 22 auth: username: "username" password: "password" filePath: "/home/username/files/out.txt"
batches keyword multiple
outputs can be specified as a list. Each batch is processed independently in its own Docker container.
batches keyword is used, the
outputs keywords cannot appear in the top level section of the RED file, as shown below.
redVersion: ... cli: ... batches: - inputs: ... outputs: ... - inputs: ... outputs: ... container: ... execution: ...
RED provides are generic way to include settings for container engines, such that CC or other tools can implement different engines. Curious Containers currently only supports Docker as a RED Container Engine.
container keyword, you have to provide the
engine name and the
settings for the chosen engine.
container: engine: "docker" settings: ...
execution keyword you can specify an execution engine, which is capable of processing the given RED file. For example the URL and access information to a CC-Agency server can be given here. For supported execution engines take a look at the RED Execution Engines documentation.
execution: engine: "ccagency" settings: ...
If you have specified an execution engine you can use the
faice exec CLI tool to execute the experiment using the given engine.