Deployment and virtualization, practice session 2

Joseph Chazalon, Clément Demoulins {firstname.lastname@lrde.epita.fr}

February 2020

Introduction

This second session about Docker will teach you how to write Dockerfiles and build images for a couple of use cases. The next session will give you more freedom about dockerizing applications and you will have to reuse what you will learn during this session.

All actions you need to perform are indicated by the following pictogram: Work

Make sure you read and understand all the material of this document, and go through all the required actions.

Also, be sure to keep notes about what you are doing, as you will have to answer graded questions about your work during the last 15 minutes of the session.

Use case 1: Distribute a Go program

The Go language creators did a great job a providing their community with easy-to-use tools thanks to Docker. Studying how things are done with Go is a good example of how to leverage Docker for software distribution.

The tools provided using Docker cover two major aspects of software development:

The creation of redistributable software artifacts: compile, link, package, etc. For this purpose we need a particular software stack which contains compilers, static analysis tools, linkers, cross-build toolchains, and so on…
The execution of the piece of software previously created over a runtime software stack. Ideally, such software stack should be well-identified and minimized to reduce as much as possible the surface for attack or dependency issues, and also to facilitate maintenance.

Why not installing and using a toolchain directly from our distribution? Here is a great answer from Docker’s blog:

If you write Go code, or if you have even the slightest interest into the Go language, you certainly have the Go compiler and toolchain installed, so you might be wondering “what’s the point?”; but there are a few scenarios where you want to compile Go without installing Go.

You still have this old Go 1.2 on your machine (that you can’t or won’t upgrade), and you have to work on this codebase that requires a newer version of the toolchain.

You want to play with cross compilation features of Go 1.5 (for instance, to make sure that you can create OS X binaries from a Linux system).

You want to have multiple versions of Go side-by-side, but don’t want to completely litter your system.

You want to be 100% sure that your project and all its dependencies download, build, and run fine on a clean system.

If any of this is relevant to you, then Docker should be useful!

In what follows, we will

Build a simple Go program using different techniques;
Distribute this program and the software stack it relies on using a Dockerfile.
Optimize a bit the size of our image to facilitate its distribution.

Build a Go program using tools from a Docker image

The very simple program we are going to build and run, and the associated resources, are packaged in the resources.tar.gz archive. It is composed of a single file named “simple_static_server.go” which should serve the static files under the “resources/static” directory.

We will build and run this program in various ways.

Identify a good base image

Let us first identify the right Docker image to build our program. What we need is a reliable image which contains build tools for the Go language.

Using the Docker hub, identify a group of images, then a particular image which you will use.

Build your program using the image directly

Using the golang image variant you selected, we can call the compilation tools directly on our files thanks a bind mount of the current directory.

Navigate to the “resources” directory of this session, then adapt and run the following command line to create your application.

docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp golang:1.8 go build -v

What is the use of the /usr/src/myapp path? Can you explain what is going on?

You should obtain a myapp file under your current directory.

Try to run the application to test it. There is a --port option available to select the port you want your server to listen to. You can check it works by connecting to http://localhost:$YOURPORT.

If you check the owner of the produced file, then you should notice that it is owned by root. This is due to the fact that the Go compiler within the container runs under the root user.

Out of curiosity, you can also check the size of the resulting binary, and its library dependencies using ldd (for example).

Prepare a runtime image

We will now focus on building an image to distribute our program and run it in isolation.

Turn-key solution

The easiest solution (but not the best one) is to use the image used to build as a base for our new image.

Using the golang image variant you previously selected as base image, create a Dockerfile which has the following steps:

set the working directory to some meaningful place within the container, like “/usr/src/myapp”;
copy the application files to this location;
call the Go builder;
expose the default port;
define the default command to run.

Then build your image using the appropriate docker build invocation.

Once you managed to build your image successfully, find the right command to launch your application. Do not forget to expose the ports of the container on the host machine.
TIP: You can use docker port $CONTAINERNAME to check the port redirections for your container.

Use a smaller base image

Check the size of the image you previously created.
What is wrong?

According to Docker’s documentation about best practices (and common sense), it is not desirable to ship large images containing a lot of useless elements.

In our particular case, we do not want to ship a Debian distribution with our tiny server (assuming you used the default “latest” image).

We are now going to use a smaller image.

Try to use the alpine variant as a base image for your Dockerfile.
What is the size of the resulting image?

We do not need all the build tools in any of the golang image variants!

Use a really small base image

The right approach is to use a minimal base image, like an alpine image, to run our program. In our case, because our program does not have complicated dependencies, we can simply copy it to a new linux image.

Use an alpine image as a base and copy the program you compiled at some convenient location, with necessary files. Make sure you can run your program.

Use a multistage build

It is possible to merge the two steps of the previous section using a multistage build. This consists in creating several images in the same Dockerfile. In our case, we need only two images:

a build image (exactly what we did before);
a runtime image (which contains only the bare minimum) like in the previous section.

The official documentation details this tricks which relies on two simple things:

naming images with the syntax: FROM image as name;
being able to copy from any named image using the following syntax: COPY --from=name /path/in/name/ /path/in/current

Use the multistage build trick to create a small image for your program, and check it works!

Use the smallest possible image

Skip this and go to the next section if you are short in time.

Go is particularly useful to produce minimal binaries thanks to static compilation.

This allows us to build a program which does not depend on anything but the kernel, except for a few files if we use, for instance, HTTPS which requires some certificates.

Go static compilation can be triggered by using:

go build -ldflags "-linkmode external -extldflags -static" -v

Using this final trick, use a multistage build to create a very small final image based on the scratch (ie empty) base image.

How would you use this neat minimal container as an on-demand HTTP server used to share your current directory over the network?

Use case 2: Create a Conda-like environment

Conda is an open source package management system and environment management system. “Package management” is about solving the dependency problem of package installation: What should I install to make this program work? “Environment management” is about enabling to separate separate independent and potentially conflicting software stacks.

Conda was first designed for Python language but it can now handle binaries and libraries from various languages. It can also run in Linux, OSX and Windows.

Conda addresses the limitations of pure-Python tools for package management (easy_install, pip, etc.) and environment management (virtualenv): those tools fail at properly managing library dependencies for Python packages. It is indeed common to face compilation issues because pip triggered a compilation of some C code which relies on a library for which the development header were not installed. This happens when you want to install NumPy, for instance, in a very small base Docker image: the headers for linear algebra libraries are not installed.

Conda’s solution is to rely on a massive repository of pre-built packages, maintained by the Anaconda company. Each Conda package can contain:

system-level libraries
Python or other modules
executable programs and other components
some metadata.

This really looks like a Docker image layer!

We are going to simulate the kind of environment Conda produces using Docker containers. This will allow us to share some parts of the environments we will create, and have the full control over our software stack.

Of course, this is a bit more complex than using Conda’s tools, but not that much in practice and it provides a few advantages:

once an environment is defined using a Dockerfile, it is very easy to replicate it;
we can limit the resources used by a container.

In what follows, we will setup a Python environment for scientific computing. Before and while writing a Dockerfile, we encourage you to test your commands in a container based on the image you want to use.

Choose a base image

Like in the previous case, identifying the right base image is a critical choice. The Docker Hub references a lot of base image to build on.

From an Alpine base image (use the docker run command directly), try to install Python (using apk add) and the latest version of the NumPy library (using some pip install variant).
What’s wrong?

For what follows, we recommend using an Ubuntu base image or the official Python image, as their support is very good and their reasonable. You may, however, use another image you more confident with. What follows assume you use an Ubuntu base, because it contains extra necessary steps.

Using Docker Hub (website or command line), identify precisely which base image you are going to use.

Install Python and pip

Using the package manager of Ubuntu (apt), identify and install the packages which provide the python3 environment, and the pip3 tool.
What is the size of the resulting image?

Locale setup

Try the following command in a container running the base image you chose:

python -c 'print("h\xe9h\xe9")'

What happens?

Let us fix our terminal: set the LANG environment variable to a saner C.UTF-8 default.
You can check the language defaults using the locale command.

Install NumPy, IPython and Jupyter

We are now ready to install Python tools.

Using appropriate pip(3) commands, install the following tools in your container:

NumPy;
IPython;
Jupyter.

You can try to limit the amount of useless space consumed by disabling the pip cache with --no-cache-dir.

Add a special user

We will do a little trick here. To avoid several issues (programs complaining when run as root, messing file permissions, etc.) we will add a new user in the container. We want this user to have the same UID and GID as our current (host) user, so we can share files easily with her, but we want this user to have a different name and a different home directory, to be able to store container-specific configuration, avoid avoid messing our own configuration on the host.

Create a new user “developer” in the container.

Declare arguments in the Dockerfile (using ARG) to take the user id and the group id this user will have at build time (thanks to --build-arg parameters to docker create / run) — you can get the UID and GID of your host user with the id command.
Give this new user a home at /home/developer.
Define the environment variable HOME.

Activate this user for the rest of the Dockerfile using the USER instruction.

Jupyter configuration and expose the ports

To avoid authentication issues with Jupyter, we will remove the security token required. This is a security issue, and you should put a custom string instead of a empty one in production.

First add the following line to your Dockerfile (make sure the environment variable HOME is previously defined in your Dockerfile):

RUN jupyter notebook --generate-config && \
  echo "c.NotebookApp.token = ''" >> $HOME/.jupyter/jupyter_notebook_config.py

Use the EXPOSE instruction to declare the port used by Jupyter (you will be able to select it when running jupyter notebook with the --port parameter).

Setup a volume for specific configuration

To avoid loosing data related to container configuration, we can declare a volume for the home directory of the new container user. This will create an external storage for this part of the filesystem, with the current content of the container at this time.

Simply use the VOLUME instruction to define a volume for the home directory of the container user.

It will be possible to mount an existing volume to this location, or to bind-mount a host directory, but this may mask the actual content of this volume created during image build.

Setup the entry point and/or the default command

Using the proper Dockerfile instruction, launch a Jupyter notebook server on the port of your choice. We recomend using a command like the following one:

jupyter notebook --no-browser --ip=0.0.0.0 --port $JUPYTER_PORT

Build the image

Build your Docker image.
Which command line are you going to pass, in order to pass build arguments properly?

Run the container

We will now run a container based on our new image.

First just test run your container to check Jupyter is starting properly.

Now start a shell in your container and check what is the current directory and user id.

Finally, find the complete perfect command line to run your container which:

bind mounts your host home directory to the same path in the container, but read only;
bind mounts the host current directory under the same path (writable);
selects the current directory as working directory;
exposes the Jupyter port properly.

Resource limitation

Improve the command line to launch your container by adding a memory limit and a CPU limit (maximum 1 CPU).
Try to launch several terminals in the container and create some load for 1, then 2, then 3 CPUs in the container (using something like yes > /dev/null for instance) and monitor what happends from the host point of view.

New tools in new image layers for alternative environments

Run a shell as root and install htop.

How would you proceed to install new Python tools in your container, like for instance, two different versions of OpenCV?
Can you list at least two ways of doing it?
What are the impact of each solution regarding the space consumed for each new element to install?