Containers
Contents
Containers on biotin3/4
What are containers?
Containerized software is a nice way to avoid the dependency nightmare and make sure that your developed software runs as expected on any machine. You can also use someone else's already developed containerized software and avoid complicated installations and compatibility issues. Building and running containers is a useful skill to both bioinformaticians and occasional users of computational tools. Read more about containers here. Not all of this is relevant here as Docker
can also run network services with their own network configuration; in the context of Singularity
/Apptainer
(biotin 3/4
), think of a container as a filesystem image that you mount; you load the container, you see the tools/files in it at.
In brief, containers provide all the necessary files and programs to execute the workflow or a pipeline. There are ready-to-use containers already available Containers are usually published in container registries, such as Docker Hub. So if you need to run a fairly standard workflow or a pipeline, you do not have to develop your own software and build your own container, but instead use an existing container image. In theory, it will contain all the dependencies and software required.
How to build your own container?
You cannot build a container on biotin3/4
because you need superuser privilege to do so. The way to go is that you install Docker
(or an alternative software) on your local machine and build a container there. A built container can be published on one of the registries (for example, Docker Hub) and later easily fetched for usage on any machine with internet access, or alternatively also just copied over to biotin3/4
.
A container building guide is available here, and below is an example Dockerfile
and the corresponding requirements file.
Content of the Dockerfile
:
# Select a base image on which you would like to build # Docker install R 4.3, Bioconductor 3.17 FROM bioconductor/bioconductor_docker:3.17 # Set up folder structure WORKDIR /opt/software # Install a specific version of the software of interest (needs to be compatible with the base image) # Install CAGEr 2.6.1 RUN R -e 'BiocManager::install("CAGEr")' # Install other R dependencies COPY requirements.R /opt/software/requirements.R RUN Rscript requirements.R # Add location of dependencies to the path within the container where R will search for them ENV R_LIBS=${R_LIBS}:/opt/software
Content of the requirements.R
file:
################################# ## List of R packages required ## ################################# .libPaths( c( "/opt/software" , .libPaths() ) ) ## CRAN packages: required_packages_cran = c( "optparse", ## Read in data "rlang", ## Error handling "tidyr", ## Data formatting and manipulation "tidyverse", ## Data formatting and manipulation "viridis", ## Plotting "ggplot2", ## Plotting "gplots") ## Plotting message("; Installing these R packages from CRAN repository: ", required_packages_cran) install.packages(required_packages_cran, repos="https://cran.uib.no/", lib="/opt/software") ## Bioconductor packages: required_packages_bioconductor <- c( "BSgenome.Hsapiens.UCSC.hg38", "ChIPseeker", "TxDb.Hsapiens.UCSC.hg38.knownGene", "org.Hs.eg.db", "rtracklayer") message("; Installing these R Bioconductor packages: ", required_packages_bioconductor) BiocManager::install(required_packages_bioconductor, lib="/opt/software")
Alternative container building software
As alternatives to Docker
, you can use Apptainer
(formerly Singularity
) or Podman
. Podman
allows for rootless container building.
How to use containers on biotin?
The two servers at NCMM, biotin3
and biotin4
, manage the software differently, therefore, you have to be aware how to use containers in biotin3
versus biotin4
. Generally, you just need to make sure Singularity
(Apptainer
) software is available to launch the container image. Remember, you can only *use* containers (with Singularity
) on biotin3/4
, but not *build* them. Singularity
and Docker
are largely compatible, in the sense that you can always use Singularity
to run images built with Docker
.
You need to specify which folders are to be seen inside (mounted to) the container. The way to do it is described here. A few directories are mounted automatically per default inside the container, for example your home directory (this is not always desirable, see the issues below). An example would be
singularity shell --bind sq123cells_processed/:/data docker://cbgr/cager261:1.4
where sq123cells_processed
is a directory in the working directory I run the singularity command from on biotin3/4
. /data
is a newly defined folder *within the container*.
By default, singularity
uses (with a few exceptions) the environment variables from the host. This means that for example if you have PYTHONPATH
or R_LIBS_SITE
set and load a container, these environment variables are also set in the container, and Python
and R
will look for modules in these respective directories. As a guideline, this should be avoided, as containers should be entirely self-contained - but there might be occasional cases where you do want this behavior. The -e
/ --cleanenv
flag will set the entire environment to default values (which ideally is what you want because the container is self-contained), so as a general guideline, we recommend using these flags. If for some reason you do need environment variables from outside of the container to still be available in the container, load it without this flag.
Snakemake and containers
Snakemake
is a workflow writing software that offers a quite nice integration of containers. It uses Singularity
to run images, which makes it easy to use on biotin3/4
. The path or name of the container(s) is defined within the Snakefile
and fetched automatically, so you just need to run the Snakefile
.
biotin3
Singularity is available on biotin3
as is. You do not need to load it. Do not be surprised that is called Apptainer
:
[katalitf@biotin3 ~]$ singularity --version apptainer version 1.3.0-1.el7
If you want to start a shell environment within your container, you can type the command below.
singularity shell docker://cbgr/cager261:1.4
Using a container within Snakemake
On biotin3
, Snakemake
is not available by default, so you have to load a Python
environment first. Snakemake
command accepts singularity
parameters, for example the -e
parameter.
scl enable rh-python38 bash snakemake --use-singularity --singularity-args '\-e' --cores 1 -np
biotin4
Apptainer
(Singularity
) is also already loaded, so you can just use it.
Using a container within Snakemake
On biotin4
, Snakemake
is available as a module (singularity usage is enabled above snakemake version 5.3.1, so it is good to stick to the version defined below). When loading the module, Apptainer will be loaded too.
module load snakemake/7.23.1-foss-2022a snakemake --use-singularity --singularity-args '\-e' --cores 1 -np
Common issues
If homedir is mounted, R
in container will see ~/.R/...
and possibly load from there rather than from container, e.g.:
singularity shell -e --bind sq123cells_processed/:/data docker://cbgr/cager261:1.4 > library(CAGEr) ... Error: package or namespace load failed for ‘SummarizedExperiment’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/div/pythagoras/u1/katalitf/R/x86_64-pc-linux-gnu-library/4.0/Matrix/libs/Matrix.so': libflexiblas.so.3: cannot open shared object file: No such file or directory
What happened here is that files in the home directory were "leaking" into the container - the R installation in the container sees ~/.R
and tries to load the requested R
module from there rather than the one provided by the container. Solution: don't mount the home directory in the container:
singularity shell -e --bind sq123cells_processed/:/data --no-home docker://cbgr/cager261:1.4
Note: this will mount the current directory as your home directory in the container - so run this command in (e.g.) an empty directory you just created.
Documentation collected on 2024-04-17 by ievarau
,katalitf
and haroldgu