Welcome to the Curious Containers framework for reproducible research.
Data, code and environment are the crucial components required to successfully reproduce a computational experiment. Usually these are scripts or executables, that have been setup to work on a researcher’s computer and that load data from the local filesystem. In this situation, it is difficult for the experiment to be stored for future use, shared with coworkers, referenced in a publication or distributed in a compute cluster for parallel batch processing.
Curious Containers (CC) is a framework for computational reproducibility, that allows researchers to define each component of an experiment as a network resource. These resources are referenced by unique identifiers in the Reproducible Experiment Description (RED) file format. An execution engine provided by CC interprets the RED file and combines all resources to run the experiment. This concept allows resources to be interchangeable and to have individual access restriction based on the chosen network transmission and authentication protocol.
If you are new to the project, we advise you to work through the RED Beginner’s Guide. As an introduction (in german), watch the following talk at the deRSE 2019 conference for research software engineering (PDF Slides, Video).
Issues and Feedback
If you have problems or want to give feedback, please open an issue in the curious-containers project on Github. Issue trackers of every other repository are closed.
How it works
CC employs Docker to install an experiment runtime environment, including code and dependencies, in a container image. Using a container registry this image becomes a network resource, that can be deployed on any Linux computer with Docker installed.
In contrast to existing research platforms, CC as a framework does not provide a tightly integrated storage solution. Instead, it connects to any data storage, that is already present in a research institution. For this to work, CC provides connector programs for common storage interfaces like SSH, HTTP or XNAT, that are shipped as part of an experiment’s container image. These connectors download input data into the running container, upload results to a defined destination or mount directories via FUSE network filesystems. If a connector for a certain storage solution does not yet exist, the user can provide a custom connector by implementing an interface specification.
CC implements two execution engines that run experiments defined in RED files. CC-FAICE is simple to install on a local computer and can run one experiment at a time. CC-Agency is a server side execution engine, that connects to docker-engines in a compute cluster for parallel execution, tracks the execution state of scheduled experiments in a database and provides a REST web interface.
CC and RED support the FAIR principles for reproducible research, that require all experiment resources to be findable, accessible, interoperable and reusable. Therefore, the command-line interface (CLI) description of the experiment’s script or executable, that is embedded in a RED file, follows the CommandLine Description of the Common Workflow Language (CWL). This enables portability between CC execution engines and a CWL runtime.
Machine Learning Workloads
CC explicitely supports machine learning and other high performance workloads. Available NVIDIA graphics processing units are accessible in Docker containers using Nvidia Container Toolkit (or its predecessor nvidia-docker). Large training data directories, that are multiple terabytes in size, can be mounted via FUSE.
Version 9 is the first stable release of RED and the Curious Containers framework. Support for RED 9 will continue in future versions of Curious Containers, even if an updated RED format is released.
The Curious Containers software is developed at CBMI (HTW Berlin - University of Applied Sciences). The work is supported by the German Federal Ministry of Economic Affairs and Energy (ZIM project BeCRF, grant number KF3470401BZ4), the German Federal Ministry of Education and Research (project deep.TEACHING, grant number 01IS17056 and project deep.HEALTH, grant number 13FH770IX6) and HTW Berlin Booster.