Containers

From medicin.ncmm.IT
Jump to: navigation, search

Containers on biotin3/4

What are containers?

Containerized software is a nice way to avoid the dependency nightmare and make sure that your developed software runs as expected on any machine. You can also use someone else's already developed containerized software and avoid complicated installations and compatibility issues. Building and running containers is a useful skill to both bioinformaticians and occasional users of computational tools. Read more about containers here. Not all of this is relevant here as Docker can also run network services with their own network configuration; in the context of Singularity/Apptainer (biotin 3/4), think of a container as a filesystem image that you mount; you load the container, you see the tools/files in it at.

In brief, containers provide all the necessary files and programs to execute the workflow or a pipeline. There are ready-to-use containers already available Containers are usually published in container registries, such as Docker Hub. So if you need to run a fairly standard workflow or a pipeline, you do not have to develop your own software and build your own container, but instead use an existing container image. In theory, it will contain all the dependencies and software required.


How to build your own container?

You cannot build a container on biotin3/4 because you need superuser privilege to do so. The way to go is that you install Docker (or an alternative software) on your local machine and build a container there. A built container can be published on one of the registries (for example, Docker Hub) and later easily fetched for usage on any machine with internet access, or alternatively also just copied over to biotin3/4.

A container building guide is available here, and below is an example Dockerfile and the corresponding requirements file.


Content of the Dockerfile:

# Select a base image on which you would like to build
# Docker install R 4.3, Bioconductor 3.17
FROM bioconductor/bioconductor_docker:3.17

# Set up folder structure
WORKDIR /opt/software

# Install a specific version of the software of interest (needs to be compatible with the base image)
# Install CAGEr 2.6.1 
RUN R -e 'BiocManager::install("CAGEr")' 

# Install other R dependencies
COPY requirements.R /opt/software/requirements.R
RUN Rscript requirements.R
# Add location of dependencies to the path within the container where R will search for them
ENV R_LIBS=${R_LIBS}:/opt/software


Content of the requirements.R file:

#################################
## List of R packages required ##
#################################
.libPaths( c( "/opt/software" , .libPaths() ) )

## CRAN packages:
required_packages_cran = c(
  "optparse",         ## Read in data
  "rlang",            ## Error handling
  "tidyr",            ## Data formatting and manipulation
  "tidyverse",        ## Data formatting and manipulation
  "viridis",          ## Plotting
  "ggplot2",          ## Plotting
  "gplots")           ## Plotting

message("; Installing these R packages from CRAN repository: ", required_packages_cran)
install.packages(required_packages_cran, repos="https://cran.uib.no/", lib="/opt/software")

## Bioconductor packages:
required_packages_bioconductor <- c(
  "BSgenome.Hsapiens.UCSC.hg38",
  "ChIPseeker",
  "TxDb.Hsapiens.UCSC.hg38.knownGene",
  "org.Hs.eg.db",
  "rtracklayer")

message("; Installing these R Bioconductor packages: ", required_packages_bioconductor)
BiocManager::install(required_packages_bioconductor, lib="/opt/software")

Alternative container building software As alternatives to Docker, you can use Apptainer (formerly Singularity) or Podman. Podman allows for rootless container building.


How to use containers on biotin?

The two servers at NCMM, biotin3 and biotin4, manage the software differently, therefore, you have to be aware how to use containers in biotin3 versus biotin4. Generally, you just need to make sure Singularity (Apptainer) software is available to launch the container image. Remember, you can only *use* containers (with Singularity) on biotin3/4, but not *build* them. Singularity and Docker are largely compatible, in the sense that you can always use Singularity to run images built with Docker.

You need to specify which folders are to be seen inside (mounted to) the container. The way to do it is described here. A few directories are mounted automatically per default inside the container, for example your home directory (this is not always desirable, see the issues below). An example would be

 singularity shell --bind sq123cells_processed/:/data docker://cbgr/cager261:1.4

where sq123cells_processed is a directory in the working directory I run the singularity command from on biotin3/4. /data is a newly defined folder *within the container*.

By default, singularity uses (with a few exceptions) the environment variables from the host. This means that for example if you have PYTHONPATH or R_LIBS_SITE set and load a container, these environment variables are also set in the container, and Python and R will look for modules in these respective directories. As a guideline, this should be avoided, as containers should be entirely self-contained - but there might be occasional cases where you do want this behavior. The -e / --cleanenv flag will set the entire environment to default values (which ideally is what you want because the container is self-contained), so as a general guideline, we recommend using these flags. If for some reason you do need environment variables from outside of the container to still be available in the container, load it without this flag.

Snakemake and containers

Snakemake is a workflow writing software that offers a quite nice integration of containers. It uses Singularity to run images, which makes it easy to use on biotin3/4. The path or name of the container(s) is defined within the Snakefile and fetched automatically, so you just need to run the Snakefile.

biotin3

Singularity is available on biotin3 as is. You do not need to load it. Do not be surprised that is called Apptainer:

 [katalitf@biotin3 ~]$ singularity --version
 apptainer version 1.3.0-1.el7

If you want to start a shell environment within your container, you can type the command below.

 singularity shell docker://cbgr/cager261:1.4

Using a container within Snakemake

On biotin3, Snakemake is not available by default, so you have to load a Python environment first. Snakemake command accepts singularity parameters, for example the -e parameter.

 scl enable rh-python38 bash
 snakemake --use-singularity --singularity-args '\-e' --cores 1 -np

biotin4

Apptainer (Singularity) is also already loaded, so you can just use it. Using a container within Snakemake On biotin4, Snakemake is available as a module (singularity usage is enabled above snakemake version 5.3.1, so it is good to stick to the version defined below). When loading the module, Apptainer will be loaded too.

 module load snakemake/7.23.1-foss-2022a
 snakemake --use-singularity --singularity-args '\-e' --cores 1 -np

Common issues

If homedir is mounted, R in container will see ~/.R/... and possibly load from there rather than from container, e.g.:

 singularity shell -e --bind sq123cells_processed/:/data docker://cbgr/cager261:1.4
 > library(CAGEr)
 ...
 Error: package or namespace load failed for ‘SummarizedExperiment’ in dyn.load(file, DLLpath = DLLpath, ...):
  unable to load shared object '/div/pythagoras/u1/katalitf/R/x86_64-pc-linux-gnu-library/4.0/Matrix/libs/Matrix.so':
   libflexiblas.so.3: cannot open shared object file: No such file or directory

What happened here is that files in the home directory were "leaking" into the container - the R installation in the container sees ~/.R and tries to load the requested R module from there rather than the one provided by the container. Solution: don't mount the home directory in the container:

 singularity shell -e --bind sq123cells_processed/:/data --no-home docker://cbgr/cager261:1.4

Note: this will mount the current directory as your home directory in the container - so run this command in (e.g.) an empty directory you just created.



Documentation collected on 2024-04-17 by ievarau,katalitf and haroldgu