How to Compile Tensorflow 1.12 on Ubuntu 16.04 Using Docker

Written by Yochay Ettun | Apr 10, 2019 12:41:27 PM

This tutorial will help you set up TensorFlow 1.12 on Ubuntu 16.04 with a GPU using Docker and Nvidia-docker.

TensorFlow is one of the most popular deep-learning libraries. It was created by Google and was released as an open-source project in 2015. TensorFlow is used for both research and production environments. Installing TensorFlow can be cumbersome. The difficulty varies based on your environment constraints, and more when you’re a data scientist that just wants to build your neural networks.

When using TensorFlow on GPU - setting up requires a few steps. In the following tutorial, we will go over the process required to setup TensorFlow.

Requirements:

NVIDIA GPU Machine
Docker and Nvidia-Docker installed. (Read “How to install docker and nvidia-docker in our blog)
Ubuntu 16.04

Step 1 - Prepare your environment with Docker and Nvidia-Docker

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. What exactly is a container? Containers allow data scientists and developers to wrap up an environment with all of the parts it needs - such as libraries and other dependencies - and ship it all out in one package.

To use docker with GPUs and to be able to use TensorFlow in your application, you’ll need to install Docker with Nvidia-Docker. If you already have those installed, move to the next step. Otherwise, you can follow our previous guide to installing nvidia docker.

Prerequisites

Step 2 - Dockerfile

Docker can build images (environments) automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

In our case, those commands will describe the installation of Python 3.6, CUDA 9 and CUDNN 7.2.1 - and of course the installation of TensorFlow 1.12 from source.

For this environment, we will use the following Dockerfile, which you can find here

FROM nvidia/cuda:9.0-base-ubuntu16.04

RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        cuda-command-line-tools-9-0 \
        cuda-cublas-dev-9-0 \
        cuda-cudart-dev-9-0 \
        cuda-cufft-dev-9-0 \
        cuda-curand-dev-9-0 \
        cuda-cusolver-dev-9-0 \
        cuda-cusparse-dev-9-0 \
        curl \
        git \
        libcudnn7=7.2.1.38-1+cuda9.0 \
        libcudnn7-dev=7.2.1.38-1+cuda9.0 \
	libnccl2=2.4.2-1+cuda9.0 \
	libnccl-dev=2.4.2-1+cuda9.0 \
        libcurl3-dev \
        libfreetype6-dev \
        libhdf5-serial-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        rsync \
        software-properties-common \
        unzip \
        zip \
        zlib1g-dev \
        wget \
        && \
    rm -rf /var/lib/apt/lists/* && \
    find /usr/local/cuda-9.0/lib64/ -type f -name 'lib*_static.a' -not -name 'libcudart_static.a' -delete && \
    rm /usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

# install python 3.6 and pip

RUN apt-get update
RUN apt-get install -y software-properties-common vim
RUN add-apt-repository ppa:jonathonf/python-3.6
RUN apt-get update

RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
RUN apt-get install -y git

RUN apt-get update && \
        apt-get install nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0 && \
        apt-get update && \
        apt-get install libnvinfer4=4.1.2-1+cuda9.0 && \
        apt-get install libnvinfer-dev=4.1.2-1+cuda9.0

RUN python3.6 -m pip install pip --upgrade
RUN python3.6 -m pip install wheel 
RUN python3.6 -m pip install six numpy wheel mock
RUN python3.6 -m pip install keras_applications
RUN python3.6 -m pip install keras_preprocessing

RUN ln -s /usr/bin/python3.6 /usr/bin/python


# Set up Bazel.

# Running bazel inside a `docker build` command causes trouble, cf:
#   https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
#   https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
    >>/etc/bazel.bazelrc
# Install the most recent bazel release.
ENV BAZEL_VERSION 0.15.0
WORKDIR /
RUN mkdir /bazel && \
    cd /bazel && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
    chmod +x bazel-*.sh && \
    ./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    cd / && \
    rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh




# Download and build TensorFlow.
WORKDIR /tensorflow
RUN git clone --branch=r1.12 --depth=1 https://github.com/tensorflow/tensorflow.git .

# Configure the build for our CUDA configuration.
ENV CI_BUILD_PYTHON python3.6
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV TF_NEED_CUDA 1
ENV TF_NEED_TENSORRT 1
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0
ENV TF_CUDA_VERSION=9.0
ENV TF_CUDNN_VERSION=7

RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
    LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:${LD_LIBRARY_PATH} \
    tensorflow/tools/ci_build/builds/configured GPU \
    bazel build -c opt --copt=-mavx --config=cuda \
	--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
        tensorflow/tools/pip_package:build_pip_package && \
    rm /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
    pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
    rm -rf /tmp/pip && \
    rm -rf /root/.cache
# Clean up pip wheel and Bazel cache when done.

WORKDIR /root

# TensorBoard
EXPOSE 6006

Dockerfile hosted with ❤ by GitHub

Step 3 - Running Dockerfile

To build the image from the Dockerfile, simply run the docker build command. Keep in mind that this build process might take a few hours to complete. We recommend using nohup utility so that if your terminal hangs - it will still run.

$ docker build -t deeplearning -f Dockerfile

This should output the setup process and should end with something similar to:

>> Successfully built deeplearning (= the image ID)

Your image is ready to use. To start the environment, simply type in the below command. But, don’t forget to replace your image id:

$ docker run --runtime=nvidia -it deeplearning /bin/bash

Step 4 - Validating TensorFlow & start building!

Validate that TensorFlow is indeed running in your Dockerfile

$ python
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-02-23 07:34:14.592926: I 
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports 

instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-23 07:34:17.452780: I 

tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA

node read from SysFS had negative value (-1), but there must be at least
one NUMA node, so returning NUMA node zero

2019-02-23 07:34:17.453267: I

tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with 

properties: 

name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0

totalMemory: 11.17GiB freeMemory: 11.10GiB

2019-02-23 07:34:17.453306: I

tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu

devices: 0

2019-02-23 07:34:17.772969: I

tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect

StreamExecutor with strength 1 edge matrix:

2019-02-23 07:34:17.773032: I

tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0

2019-02-23 07:34:17.773054: I

tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N

2019-02-23 07:34:17.773403: I

tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow

device (/job:localhost/replica:0/task:0/device:GPU:0 with 10757 MB memory)

-> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0,

compute capability: 3.7)

Device mapping:

/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80,

pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-02-23 07:34:17.774289: I
tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, 
pci bus id: 0000:00:1e.0, compute capability: 3.7

Congrats! Your new TensorFlow environment is set up and ready to start training, testing and deploying your deep learning models!

View full post