This tutorial will help you set up TensorFlow 1.12 on Ubuntu 16.04 with a GPU using Docker and Nvidia-docker.
TensorFlow is one of the most popular deep-learning libraries. It was created by Google and was released as an open-source project in 2015. TensorFlow is used for both research and production environments. Installing TensorFlow can be cumbersome. The difficulty varies based on your environment constraints, and more when you’re a data scientist that just wants to build your neural networks.
When using TensorFlow on GPU - setting up requires a few steps. In the following tutorial, we will go over the process required to setup TensorFlow.
Requirements:
- NVIDIA GPU Machine
- Docker and Nvidia-Docker installed. (Read “How to install docker and nvidia-docker in our blog)
- Ubuntu 16.04
Step 1 - Prepare your environment with Docker and Nvidia-Docker
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. What exactly is a container? Containers allow data scientists and developers to wrap up an environment with all of the parts it needs - such as libraries and other dependencies - and ship it all out in one package.
To use docker with GPUs and to be able to use TensorFlow in your application, you’ll need to install Docker with Nvidia-Docker. If you already have those installed, move to the next step. Otherwise, you can follow our previous guide to installing nvidia docker.
Prerequisites
Step 2 - Dockerfile
Docker can build images (environments) automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.
In our case, those commands will describe the installation of Python 3.6, CUDA 9 and CUDNN 7.2.1 - and of course the installation of TensorFlow 1.12 from source.
For this environment, we will use the following Dockerfile, which you can find here
FROM nvidia/cuda:9.0-base-ubuntu16.04
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-9-0 \
cuda-cublas-dev-9-0 \
cuda-cudart-dev-9-0 \
cuda-cufft-dev-9-0 \
cuda-curand-dev-9-0 \
cuda-cusolver-dev-9-0 \
cuda-cusparse-dev-9-0 \
curl \
git \
libcudnn7=7.2.1.38-1+cuda9.0 \
libcudnn7-dev=7.2.1.38-1+cuda9.0 \
libnccl2=2.4.2-1+cuda9.0 \
libnccl-dev=2.4.2-1+cuda9.0 \
libcurl3-dev \
libfreetype6-dev \
libhdf5-serial-dev \
libpng12-dev \
libzmq3-dev \
pkg-config \
rsync \
software-properties-common \
unzip \
zip \
zlib1g-dev \
wget \
&& \
rm -rf /var/lib/apt/lists/* && \
find /usr/local/cuda-9.0/lib64/ -type f -name 'lib*_static.a' -not -name 'libcudart_static.a' -delete && \
rm /usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
# install python 3.6 and pip
RUN apt-get update
RUN apt-get install -y software-properties-common vim
RUN add-apt-repository ppa:jonathonf/python-3.6
RUN apt-get update
RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
RUN apt-get install -y git
RUN apt-get update && \
apt-get install nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0 && \
apt-get update && \
apt-get install libnvinfer4=4.1.2-1+cuda9.0 && \
apt-get install libnvinfer-dev=4.1.2-1+cuda9.0
RUN python3.6 -m pip install pip --upgrade
RUN python3.6 -m pip install wheel
RUN python3.6 -m pip install six numpy wheel mock
RUN python3.6 -m pip install keras_applications
RUN python3.6 -m pip install keras_preprocessing
RUN ln -s /usr/bin/python3.6 /usr/bin/python
# Set up Bazel.
# Running bazel inside a `docker build` command causes trouble, cf:
# https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
# https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
>>/etc/bazel.bazelrc
# Install the most recent bazel release.
ENV BAZEL_VERSION 0.15.0
WORKDIR /
RUN mkdir /bazel && \
cd /bazel && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
chmod +x bazel-*.sh && \
./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
cd / && \
rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
# Download and build TensorFlow.
WORKDIR /tensorflow
RUN git clone --branch=r1.12 --depth=1 https://github.com/tensorflow/tensorflow.git .
# Configure the build for our CUDA configuration.
ENV CI_BUILD_PYTHON python3.6
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV TF_NEED_CUDA 1
ENV TF_NEED_TENSORRT 1
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0
ENV TF_CUDA_VERSION=9.0
ENV TF_CUDNN_VERSION=7
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:${LD_LIBRARY_PATH} \
tensorflow/tools/ci_build/builds/configured GPU \
bazel build -c opt --copt=-mavx --config=cuda \
--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
tensorflow/tools/pip_package:build_pip_package && \
rm /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
rm -rf /tmp/pip && \
rm -rf /root/.cache
# Clean up pip wheel and Bazel cache when done.
WORKDIR /root
# TensorBoard
EXPOSE 6006
Dockerfile hosted with ❤ by GitHub
Step 3 - Running Dockerfile
To build the image from the Dockerfile, simply run the docker build command. Keep in mind that this build process might take a few hours to complete. We recommend using nohup utility so that if your terminal hangs - it will still run.
$ docker build -t deeplearning -f Dockerfile
This should output the setup process and should end with something similar to:
>> Successfully built deeplearning (= the image ID)
Your image is ready to use. To start the environment, simply type in the below command. But, don’t forget to replace your image id:
$ docker run --runtime=nvidia -it deeplearning /bin/bash
Step 4 - Validating TensorFlow & start building!
Validate that TensorFlow is indeed running in your Dockerfile
$ python
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-02-23 07:34:14.592926: I
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-23 07:34:17.452780: I
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA
node read from SysFS had negative value (-1), but there must be at least
one NUMA node, so returning NUMA node zero
2019-02-23 07:34:17.453267: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with
properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-02-23 07:34:17.453306: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu
devices: 0
2019-02-23 07:34:17.772969: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect
StreamExecutor with strength 1 edge matrix:
2019-02-23 07:34:17.773032: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-02-23 07:34:17.773054: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-02-23 07:34:17.773403: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 10757 MB memory)
-> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0,
compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80,
pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-02-23 07:34:17.774289: I
tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80,
pci bus id: 0000:00:1e.0, compute capability: 3.7
Congrats! Your new TensorFlow environment is set up and ready to start training, testing and deploying your deep learning models!