TensorFlow is one of the most popular deep-learning libraries. It was created by Google and was released as an open-source project in 2015. TensorFlow is used for both research and production environments. Installing TensorFlow can be cumbersome. The difficulty varies based on your environment constraints, and more when you’re a data scientist that just wants to build your neural networks.
When using TensorFlow on GPU - setting up requires a few steps. In the following tutorial, we will go over the process required to setup TensorFlow.
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. What exactly is a container? Containers allow data scientists and developers to wrap up an environment with all of the parts it needs - such as libraries and other dependencies - and ship it all out in one package.
To use docker with GPUs and to be able to use TensorFlow in your application, you’ll need to install Docker with Nvidia-Docker. If you already have those installed, move to the next step. Otherwise, you can follow our previous guide to installing nvidia docker.
Prerequisites
Docker can build images (environments) automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.
In our case, those commands will describe the installation of Python 3.6, CUDA 9 and CUDNN 7.2.1 - and of course the installation of TensorFlow 1.12 from source.
For this environment, we will use the following Dockerfile, which you can find here
FROM nvidia/cuda:9.0-base-ubuntu16.04
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-9-0 \
cuda-cublas-dev-9-0 \
cuda-cudart-dev-9-0 \
cuda-cufft-dev-9-0 \
cuda-curand-dev-9-0 \
cuda-cusolver-dev-9-0 \
cuda-cusparse-dev-9-0 \
curl \
git \
libcudnn7=7.2.1.38-1+cuda9.0 \
libcudnn7-dev=7.2.1.38-1+cuda9.0 \
libnccl2=2.4.2-1+cuda9.0 \
libnccl-dev=2.4.2-1+cuda9.0 \
libcurl3-dev \
libfreetype6-dev \
libhdf5-serial-dev \
libpng12-dev \
libzmq3-dev \
pkg-config \
rsync \
software-properties-common \
unzip \
zip \
zlib1g-dev \
wget \
&& \
rm -rf /var/lib/apt/lists/* && \
find /usr/local/cuda-9.0/lib64/ -type f -name 'lib*_static.a' -not -name 'libcudart_static.a' -delete && \
rm /usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
# install python 3.6 and pip
RUN apt-get update
RUN apt-get install -y software-properties-common vim
RUN add-apt-repository ppa:jonathonf/python-3.6
RUN apt-get update
RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
RUN apt-get install -y git
RUN apt-get update && \
apt-get install nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0 && \
apt-get update && \
apt-get install libnvinfer4=4.1.2-1+cuda9.0 && \
apt-get install libnvinfer-dev=4.1.2-1+cuda9.0
RUN python3.6 -m pip install pip --upgrade
RUN python3.6 -m pip install wheel
RUN python3.6 -m pip install six numpy wheel mock
RUN python3.6 -m pip install keras_applications
RUN python3.6 -m pip install keras_preprocessing
RUN ln -s /usr/bin/python3.6 /usr/bin/python
# Set up Bazel.
# Running bazel inside a `docker build` command causes trouble, cf:
# https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
# https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
>>/etc/bazel.bazelrc
# Install the most recent bazel release.
ENV BAZEL_VERSION 0.15.0
WORKDIR /
RUN mkdir /bazel && \
cd /bazel && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
chmod +x bazel-*.sh && \
./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
cd / && \
rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
# Download and build TensorFlow.
WORKDIR /tensorflow
RUN git clone --branch=r1.12 --depth=1 https://github.com/tensorflow/tensorflow.git .
# Configure the build for our CUDA configuration.
ENV CI_BUILD_PYTHON python3.6
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV TF_NEED_CUDA 1
ENV TF_NEED_TENSORRT 1
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0
ENV TF_CUDA_VERSION=9.0
ENV TF_CUDNN_VERSION=7
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:${LD_LIBRARY_PATH} \
tensorflow/tools/ci_build/builds/configured GPU \
bazel build -c opt --copt=-mavx --config=cuda \
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
tensorflow/tools/pip_package:build_pip_package && \
rm /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
rm -rf /tmp/pip && \
rm -rf /root/.cache
# Clean up pip wheel and Bazel cache when done.
WORKDIR /root
# TensorBoard
EXPOSE 6006
Dockerfile hosted with ❤ by GitHub
To build the image from the Dockerfile, simply run the docker build command. Keep in mind that this build process might take a few hours to complete. We recommend using nohup utility so that if your terminal hangs - it will still run.
$ docker build -t deeplearning -f Dockerfile
This should output the setup process and should end with something similar to:
>> Successfully built deeplearning (= the image ID)
Your image is ready to use. To start the environment, simply type in the below command. But, don’t forget to replace your image id:
$ docker run --runtime=nvidia -it deeplearning /bin/bash
Validate that TensorFlow is indeed running in your Dockerfile
$ python
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-02-23 07:34:14.592926: I
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-23 07:34:17.452780: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA
node read from SysFS had negative value (-1), but there must be at least
one NUMA node, so returning NUMA node zero
2019-02-23 07:34:17.453267: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with
properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-02-23 07:34:17.453306: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu
devices: 0
2019-02-23 07:34:17.772969: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect
StreamExecutor with strength 1 edge matrix:
2019-02-23 07:34:17.773032: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-02-23 07:34:17.773054: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-02-23 07:34:17.773403: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 10757 MB memory)
-> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0,
compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80,
pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-02-23 07:34:17.774289: I
tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80,
pci bus id: 0000:00:1e.0, compute capability: 3.7
Congrats! Your new TensorFlow environment is set up and ready to start training, testing and deploying your deep learning models!