March 2022
This second session about Docker will teach you how to write Dockerfiles and build images for a couple of use cases. The next session will give you more freedom about dockerizing applications and you will have to reuse what you will learn during this session.
All actions you need to perform are indicated by the following pictogram:
Make sure you read and understand all the material of this document, and go through all the required actions.
Also, be sure to keep notes about what you are doing, as you will have to answer graded questions about your work during the last 15 minutes of the session.
The Go language creators did a great job a providing their community with easy-to-use tools thanks to Docker. Studying how things are done with Go is a good example of how to leverage Docker for software distribution.
The tools provided using Docker cover two major aspects of software development:
Why not installing and using a toolchain directly from our distribution? Here is a great answer from Docker’s blog:
If you write Go code, or if you have even the slightest interest into the Go language, you certainly have the Go compiler and toolchain installed, so you might be wondering “what’s the point?”; but there are a few scenarios where you want to compile Go without installing Go.
- You still have this old Go 1.2 on your machine (that you can’t or won’t upgrade), and you have to work on this codebase that requires a newer version of the toolchain.
- You want to play with cross compilation features of Go 1.5 (for instance, to make sure that you can create OS X binaries from a Linux system).
- You want to have multiple versions of Go side-by-side, but don’t want to completely litter your system.
- You want to be 100% sure that your project and all its dependencies download, build, and run fine on a clean system.
If any of this is relevant to you, then Docker should be useful!
In what follows, we will
The very simple program we are going to build and run, and the associated resources, are packaged in the resources.tar.gz
archive. It is composed of a single file named “simple_static_server.go
” which should serve the static files under the “resources/static
” directory.
We will build and run this program in various ways.
Let us first identify the right Docker image to build our program. What we need is a reliable image which contains build tools for the Go language.
Using the Docker hub, identify a group of images, then a particular image which you will use.
Using the golang
image variant you selected, we can call the compilation tools directly on our files thanks a bind mount of the current directory.
Navigate to the “resources
” directory of this session, then adapt and run the following command line to create your application.
docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp golang:1.8 go build -v
What is the use of the /usr/src/myapp
path? Can you explain what is going on?
You should obtain a myapp
file under your current directory.
Try to run the application to test it. There is a --port
option available to select the port you want your server to listen to. You can check it works by connecting to http://localhost:$YOURPORT
.
If you check the owner of the produced file, then you should notice that it is owned by root
. This is due to the fact that the Go compiler within the container runs under the root
user.
Out of curiosity, you can also check the size of the resulting binary, and its library dependencies using ldd
(for example).
We will now focus on building an image to distribute our program and run it in isolation.
The easiest solution (but not the best one) is to use the image used to build as a base for our new image.
Using the golang
image variant you previously selected as base image, create a Dockerfile which has the following steps:
/usr/src/myapp
”;Then build your image using the appropriate docker build
invocation.
Once you managed to build your image successfully, find the right command to launch your application. Do not forget to expose the ports of the container on the host machine.
TIP: You can use docker port $CONTAINERNAME
to check the port redirections for your container.
Check the size of the image you previously created.
What is wrong?
According to Docker’s documentation about best practices (and common sense), it is not desirable to ship large images containing a lot of useless elements.
In our particular case, we do not want to ship a Debian distribution with our tiny server (assuming you used the default “latest
” image).
We are now going to use a smaller image.
Try to use the alpine
variant as a base image for your Dockerfile.
What is the size of the resulting image?
We do not need all the build tools in any of the golang
image variants!
The right approach is to use a minimal base image, like an alpine
image, to run our program. In our case, because our program does not have complicated dependencies, we can simply copy it to a new linux image.
Use an alpine image as a base and copy the program you compiled at some convenient location, with necessary files. Make sure you can run your program.
It is possible to merge the two steps of the previous section using a multistage build. This consists in creating several images in the same Dockerfile. In our case, we need only two images:
The official documentation details this tricks which relies on two simple things:
FROM image as name
;COPY --from=name /path/in/name/ /path/in/current
Use the multistage build trick to create a small image for your program, and check it works!
Skip this and go to the next section if you are short in time.
Go is particularly useful to produce minimal binaries thanks to static compilation.
This allows us to build a program which does not depend on anything but the kernel, except for a few files if we use, for instance, HTTPS which requires some certificates.
Go static compilation can be triggered by using:
go build -ldflags "-linkmode external -extldflags -static" -v
Using this final trick, use a multistage build to create a very small final image based on the scratch
(ie empty) base image.
How would you use this neat minimal container as an on-demand HTTP server used to share your current directory over the network?
Conda is an open source package management system and environment management system. “Package management” is about solving the dependency problem of package installation: What should I install to make this program work? “Environment management” is about enabling to separate separate independent and potentially conflicting software stacks.
Conda was first designed for Python language but it can now handle binaries and libraries from various languages. It can also run in Linux, OSX and Windows.
Conda addresses the limitations of pure-Python tools for package management (easy_install, pip, etc.) and environment management (virtualenv): those tools fail at properly managing library dependencies for Python packages. It is indeed common to face compilation issues because pip triggered a compilation of some C code which relies on a library for which the development header were not installed. This happens when you want to install NumPy, for instance, in a very small base Docker image: the headers for linear algebra libraries are not installed.
Conda’s solution is to rely on a massive repository of pre-built packages, maintained by the Anaconda company. Each Conda package can contain:
This really looks like a Docker image layer!
We are going to simulate the kind of environment Conda produces using Docker containers. This will allow us to share some parts of the environments we will create, and have the full control over our software stack.
Of course, this is a bit more complex than using Conda’s tools, but not that much in practice and it provides a few advantages:
In what follows, we will setup a Python environment for scientific computing. Before and while writing a Dockerfile, we encourage you to test your commands in a container based on the image you want to use.
Like in the previous case, identifying the right base image is a critical choice. The Docker Hub references a lot of base image to build on.
From an Alpine base image (use the docker run
command directly), try to install Python (using apk add
) and the latest version of the NumPy library (using some pip install
variant).
What’s wrong?
For what follows, we recommend using an Ubuntu base image or the official Python image, as their support is very good and their reasonable. You may, however, use another image you more confident with. What follows assume you use an Ubuntu base, because it contains extra necessary steps.
Using Docker Hub (website or command line), identify precisely which base image you are going to use.
Using the package manager of Ubuntu (apt
), identify and install the packages which provide the python3
environment, and the pip3
tool.
What is the size of the resulting image?
Try the following command in a container running the base image you chose:
python -c 'print("h\xe9h\xe9")'
Does it work?
So not long ago, the ubuntu terminal is in POSIX locale (20th century is calling…), accepting only ASCII characters both for input and output.
If the previous command failed, you need to fix your terminal: set the LANG
environment variable to a saner C.UTF-8
default.
You can check the language defaults using the locale
command.
# Shell
export LANG=C.UTF-8
# Dockerfile
ENV LANG=C.UTF-8
We are now ready to install Python tools.
Using appropriate pip(3)
commands, install the following tools in your container:
You can try to limit the amount of useless space consumed by disabling the pip cache with --no-cache-dir
.
This is not needed with podman (or docker in rootless mode). The root user inside podman is automatically mapped to the current user.
We will do a little trick here. To avoid several issues (programs complaining when run as root
, messing file permissions, etc.) we will add a new user in the container. We want this user to have the same UID and GID as our current (host) user, so we can share files easily with her, but we want this user to have a different name and a different home directory, to be able to store container-specific configuration, avoid avoid messing our own configuration on the host.
Create a new user “developer” in the container.
ARG
) to take the user id and the group id this user will have at build time (thanks to --build-arg
parameters to docker create / run
) — you can get the UID and GID of your host user with the id
command./home/developer
.HOME
.Activate this user for the rest of the Dockerfile using the USER
instruction.
To avoid authentication issues with Jupyter, we will remove the security token required. This is a security issue, and you should put a custom string instead of a empty one in production.
First add the following line to your Dockerfile (make sure the environment variable HOME
is previously defined in your Dockerfile):
RUN jupyter notebook --generate-config && \
echo "c.NotebookApp.token = ''" >> $HOME/.jupyter/jupyter_notebook_config.py
Use the EXPOSE
instruction to declare the port used by Jupyter (you will be able to select it when running jupyter notebook
with the --port
parameter).
To avoid loosing data related to container configuration, we can declare a volume for the home directory of the new container user. This will create an external storage for this part of the filesystem, with the current content of the container at this time.
Simply use the VOLUME
instruction to define a volume for the home directory of the container user.
It will be possible to mount an existing volume to this location, or to bind-mount a host directory, but this may mask the actual content of this volume created during image build.
Using the proper Dockerfile instruction, launch a Jupyter notebook server on the port of your choice. We recomend using a command like the following one:
jupyter notebook --no-browser --ip=0.0.0.0 --port $JUPYTER_PORT
Build your Docker image.
Which command line are you going to pass, in order to pass build arguments properly?
We will now run a container based on our new image.
First just test run your container to check Jupyter is starting properly.
Now start a shell in your container and check what is the current directory and user id.
Finally, find the complete perfect command line to run your container which:
Improve the command line to launch your container by adding a memory limit and a CPU limit (maximum 1 CPU).
Try to launch several terminals in the container and create some load for 1, then 2, then 3 CPUs in the container (using something like yes > /dev/null
for instance) and monitor what happends from the host point of view.
Run a shell as root and install htop.
How would you proceed to install new Python tools in your container, like for instance, two different versions of OpenCV?
Can you list at least two ways of doing it?
What are the impact of each solution regarding the space consumed for each new element to install?