Enable building dev docker image with CPP backend support #2976

namannandan · 2024-02-28T07:55:00Z

Description

Docker container with CPP backend support on CPU

$ cd docker
$ ./build_image.sh -bt dev -cpp &> docker_build_log.txt
$ docker images
REPOSITORY           TAG           IMAGE ID       CREATED             SIZE
pytorch/torchserve   cpp-dev-cpu   f34040597fea   About an hour ago   1.68GB

docker_build_log.txt

Docker container with CPP backend support on GPU

$ cd docker
$ ./build_image.sh -bt dev -g -cv cu121 -cpp &> docker_build_gpu_log.txt
$ docker images
REPOSITORY           TAG                               IMAGE ID       CREATED          SIZE
pytorch/torchserve   cpp-dev-gpu                       539b7b503ccb   51 minutes ago   10.2GB

docker_build_gpu_log.txt

Fixes #2908

Type of change

New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

CPU docker container test

$ docker images
REPOSITORY           TAG           IMAGE ID       CREATED             SIZE
pytorch/torchserve   cpp-dev-cpu   f34040597fea   About an hour ago   1.68GB

$ docker run -it f34040597fea /bin/bash

root@5159684c405e:/serve# python ts_scripts/install_dependencies.py --environment=dev --cpp &> install_dependencies_logs.txt

install_dependencies_log.txt

root@5159684c405e:/serve# cd cpp
root@5159684c405e:/serve/cpp# ./build.sh &> cpp_build_log.txt

cpp_build_log.txt

root@5159684c405e:/serve/cpp# cd ..
root@5159684c405e:/serve# python ts_scripts/install_from_src.py &> install_from_src_logs.txt

install_from_src_logs.txt

root@5159684c405e:/serve# mkdir model_store

root@5159684c405e:/serve# torchserve --ncs --start --model-store ./model_store --models mnist=/serve/cpp/_build/test/resources/examples/mnist/mnist_handler/

root@5159684c405e:/serve# curl http://127.0.0.1:8080/predictions/mnist -T /serve/cpp/_build/test/resources/examples/mnist/0_png.pt --output out.pt

root@5159684c405e:/serve# python -c "import torch; print(torch.load('out.pt'))"
tensor(0)

ts_log.log

GPU docker container test

$ docker images
REPOSITORY           TAG                               IMAGE ID       CREATED          SIZE
pytorch/torchserve   cpp-dev-gpu                       539b7b503ccb   51 minutes ago   10.2GB

$ docker run --gpus all -it 539b7b503ccb /bin/bash

root@27cb03602315:/serve# python ts_scripts/install_dependencies.py --environment=dev --cuda=cu121 --cpp &> install_dependencies_gpu_log.txt

install_dependencies_gpu_log.txt

root@27cb03602315:/serve# cd cpp

root@27cb03602315:/serve/cpp# ./build.sh -g cu121 &> cpp_build_gpu_log.txt

cpp_build_gpu_log.txt

root@27cb03602315:/serve/cpp# cd ..

root@27cb03602315:/serve# python ts_scripts/install_from_src.py &> install_from_src_gpu_logs.txt

install_from_src_gpu_logs.txt

root@27cb03602315:/serve# mkdir model_store

root@27cb03602315:/serve# torchserve --ncs --start --model-store ./model_store --models mnist=/serve/cpp/_build/test/resources/examples/mnist/mnist_handler/

root@27cb03602315:/serve# curl http://127.0.0.1:8080/predictions/mnist -T /serve/cpp/_build/test/resources/examples/mnist/0_png.pt --output out.pt

root@27cb03602315:/serve# python -c "import torch; print(torch.load('out.pt'))"
tensor(0, device='cuda:0')

ts_log.log

$ nvidia-smi
Sat Mar  9 01:33:47 2024                                                                                                                                                                                          
+---------------------------------------------------------------------------------------+                                                                                                                         
| NVIDIA-SMI 535.104.12             Driver Version: 535.104.12   CUDA Version: 12.2     |                                                                                                                         
|-----------------------------------------+----------------------+----------------------+                                                                                                                         
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |                                                                                                                         
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |                                                                                                                         
|                                         |                      |               MIG M. |                                                                                                                         
|=========================================+======================+======================|                                                                                                                         
|   0  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |                                                                                                                         
| N/A   32C    P0              32W /  70W |    162MiB / 15360MiB |      0%      Default |                                                                                                                         
|                                         |                      |                  N/A |                                                                                                                         
+-----------------------------------------+----------------------+----------------------+                                                                                                                         
                                                                                                                                                                                                                  
+---------------------------------------------------------------------------------------+                                                                                                                         
| Processes:                                                                            |                                                                                                                         
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |                                                                                                                         
|        ID   ID                                                             Usage      |                                                                                                                         
|=======================================================================================|                                                                                                                         
|    0   N/A  N/A    247743      C   ...ages/ts/cpp/bin/model_worker_socket      158MiB |                                                                                                                         
+---------------------------------------------------------------------------------------+

lxning

We need a separate Dockerfile for cpp at this moment. Otherwise, the image is too big and too complicate.

cpp/CMakeLists.txt

cpp/src/utils/CMakeLists.txt

docker/Dockerfile

ts_scripts/install_dependencies.py

agunapal · 2024-02-29T18:22:44Z

@namannandan We should use docker multi stage builds . i.e...don't do any installation in the final build stage. This will make the image size bigger. Please check how this is done with compile-image and production-image in DockerFile

chauhang · 2024-03-07T09:22:30Z

@namannandan Can you please check the regression failures

agunapal

Can we change the file name to Dockerfile.cpp . From what I've seen in various repos, the convention is Dockerfile.xyz

mreso

Some minor remarks to address before merging. Are there plans for a prod version with two staged docker build that copies the binaries and discards the build artifacts?

docker/cpp.Dockerfile

namannandan · 2024-03-11T17:16:57Z

Some minor remarks to address before merging. Are there plans for a prod version with two staged docker build that copies the binaries and discards the build artifacts?

Yes, I do have an initial implementation of the multi stage build which copies only the binaries to the final production image, but for GPU support, the CPP backend & dependencies compilation during docker build requires more work and testing. So, targeting only the dev container for the upcoming release. Will create a separate PR for the production container.

mreso

LGTM see comments

docker/Dockerfile.cpp

lxning · 2024-03-11T22:26:33Z

cpp/README.md

@@ -4,6 +4,24 @@
 * GCC version: gcc-9
 * cmake version: 3.18+


it should be 3.26.4

lxning · 2024-03-11T22:30:01Z

cpp/README.md

+```
+cd serve/docker
+# For CPU support
+./build_image.sh -bt dev -cpp
+# For GPU support
+./build_image.sh -bt dev -g [-cv cu121|cu118] -cpp
+```
+
+Start the container and optionally bind mount a build directory into the container to persist build artifacts across container runs
+```
+# For CPU support
+docker run [-v /path/to/build/dir:/serve/cpp/_build] -it pytorch/torchserve:cpp-dev-cpu /bin/bash
+# For GPU support
+docker run --gpus all [-v /path/to/build/dir:/serve/cpp/_build] -it pytorch/torchserve:cpp-dev-gpu /bin/bash
+```


The docker detailed information should be updated in docker README to make sure information centralized. Here we can provide link to docker readme

lxning · 2024-03-11T22:34:47Z

docker/Dockerfile.cpp

+ARG CMAKE_VERSION=3.26.4
+ARG BRANCH_NAME="master"
+ARG USE_CUDA_VERSION=""
+


gcc version should be set.

lxning · 2024-03-11T23:04:36Z

docker/Dockerfile.cpp

+
+# Enable installation of recent cmake release
+# Ref: https://apt.kitware.com/
+RUN (wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null) \


Where is $CMAKE_VERSION used in the installation? It seems that the latest cmake is installed in this command.

Cmake version is pinned here:

serve/docker/Dockerfile.cpp

Lines 63 to 66 in 3694b11

# Pin cmake and cmake-data version

# Ref: https://manpages.ubuntu.com/manpages/xenial/man5/apt_preferences.5.html

RUN echo "Package: cmake\nPin: version $CMAKE_VERSION*\nPin-Priority: 1001" > /etc/apt/preferences.d/cmake

RUN echo "Package: cmake-data\nPin: version $CMAKE_VERSION*\nPin-Priority: 1001" > /etc/apt/preferences.d/cmake-data

So when we install cmake here:

serve/ts_scripts/install_dependencies.py

Line 17 in 3694b11

"cmake",

it will install cmake 3.26.4

lxning · 2024-03-11T23:08:22Z

docker/Dockerfile.cpp

+RUN apt-get update && \
+    if echo "$BASE_IMAGE" | grep -q "cuda:"; then \
+        if [ "$USE_CUDA_VERSION" = "cu121" ]; then \
+            apt-get -y install cuda-toolkit-12-1; \
+        elif [ "$USE_CUDA_VERSION" = "cu118" ]; then \
+            apt-get -y install cuda-toolkit-11-8; \
+        else \
+            echo "Cuda version not supported by CPP backend: $USE_CUDA_VERSION"; \
+            exit 1; \
+        fi; \
+    fi \
+    && rm -rf /var/lib/apt/lists/*


All of this part should be replaced with install_dependency (ref: https://github.com/pytorch/serve/blob/master/docker/Dockerfile.dev#L71)

Additional packages aside from the ones installed by install_depenedencies.py are required. Following are the error traces I see, if the cuda-toolkit is not installed:

[2024-03-11 23:50:09,812] torch._inductor.utils: [WARNING] not enough SMs to use max_autotune_gemm mode In file included from /home/venv/lib/python3.9/site-packages/torch/include/torch/csrc/inductor/aoti_runtime/model.h:15, from /home/venv/lib/python3.9/site-packages/torch/include/torch/csrc/inductor/aoti_runtime/model_container.h:13, from /serve/examples/cpp/aot_inductor/bert/cly46ndmfcer53dv4xkdyjmpl3mc7277slw3od3ue5ygudxcb766.cpp:2: /home/venv/lib/python3.9/site-packages/torch/include/torch/csrc/inductor/aoti_runtime/device_utils.h:14:10: fatal error: cuda.h: No such file or directory 14 | #include <cuda.h> | ^~~~~~~~ compilation terminated.

Installing cuda-compiler resolves the above issue, but we'll then run into:

-- Found Threads: TRUE CMake Error at /home/venv/lib/python3.9/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:70 (message): Failed to find nvToolsExt Call Stack (most recent call first): /home/venv/lib/python3.9/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include) /home/venv/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) CMakeLists.txt:25 (find_package)

Instead of manually having to install all the necessary packages individually, it is cleaner to install the cuda-toolkit.

namannandan requested review from mreso, lxning and agunapal February 28, 2024 07:55

namannandan added this to the v0.10.0 milestone Feb 28, 2024

lxning reviewed Feb 28, 2024

View reviewed changes

cpp/CMakeLists.txt Outdated Show resolved Hide resolved

cpp/src/utils/CMakeLists.txt Outdated Show resolved Hide resolved

docker/Dockerfile Outdated Show resolved Hide resolved

ts_scripts/install_dependencies.py Outdated Show resolved Hide resolved

namannandan force-pushed the cpp-dev-cpu-docker branch from 186f53c to b5b10b8 Compare March 6, 2024 18:53

namannandan requested a review from lxning March 7, 2024 01:49

agunapal reviewed Mar 7, 2024

View reviewed changes

mreso approved these changes Mar 7, 2024

View reviewed changes

docker/cpp.Dockerfile Outdated Show resolved Hide resolved

docker/cpp.Dockerfile Outdated Show resolved Hide resolved

docker/cpp.Dockerfile Outdated Show resolved Hide resolved

docker/cpp.Dockerfile Outdated Show resolved Hide resolved

namannandan changed the title ~~Enable building CPU dev docker image with CPP backend support~~ Enable building dev docker image with CPP backend support Mar 9, 2024

namannandan added 6 commits March 11, 2024 10:07

Create new Dockerfile for CPP dev docker build with CPU support

473dcb2

Pin cmake version to 3.26.4

29a362f

Install torchserve to ensure "ts" directory is present in site-packages

baec8bc

Refactor CPP Dockerfile

0bb0e59

Add GPU CPP container support

0b954e9

Update documentation

5caf2a2

namannandan force-pushed the cpp-dev-cpu-docker branch from b9bf629 to 5caf2a2 Compare March 11, 2024 17:10

namannandan requested a review from agunapal March 11, 2024 17:34

mreso approved these changes Mar 11, 2024

View reviewed changes

docker/Dockerfile.cpp Outdated Show resolved Hide resolved

docker/Dockerfile.cpp Outdated Show resolved Hide resolved

docker/Dockerfile.cpp Outdated Show resolved Hide resolved

docker/Dockerfile.cpp Show resolved Hide resolved

docker/Dockerfile.cpp Outdated Show resolved Hide resolved

namannandan added 2 commits March 11, 2024 14:03

Refactor Dockerfile.cpp

7df7831

Merge branch 'master' into cpp-dev-cpu-docker

a4c4bd0

namannandan added this pull request to the merge queue Mar 11, 2024

Merged via the queue into master with commit 3694b11 Mar 11, 2024
15 checks passed

lxning reviewed Mar 11, 2024

View reviewed changes

namannandan mentioned this pull request Mar 12, 2024

Refactor Dokcerfile.cpp and documentation #3015

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable building dev docker image with CPP backend support #2976

Enable building dev docker image with CPP backend support #2976

namannandan commented Feb 28, 2024 •

edited

Loading

lxning left a comment

agunapal commented Feb 29, 2024

chauhang commented Mar 7, 2024

agunapal left a comment

mreso left a comment

namannandan commented Mar 11, 2024

mreso left a comment

lxning Mar 11, 2024

lxning Mar 11, 2024

lxning Mar 11, 2024

lxning Mar 11, 2024

namannandan Mar 11, 2024

lxning Mar 11, 2024

namannandan Mar 12, 2024

	# Pin cmake and cmake-data version
	# Ref: https://manpages.ubuntu.com/manpages/xenial/man5/apt_preferences.5.html
	RUN echo "Package: cmake\nPin: version $CMAKE_VERSION*\nPin-Priority: 1001" > /etc/apt/preferences.d/cmake
	RUN echo "Package: cmake-data\nPin: version $CMAKE_VERSION*\nPin-Priority: 1001" > /etc/apt/preferences.d/cmake-data

Enable building dev docker image with CPP backend support #2976

Enable building dev docker image with CPP backend support #2976

Conversation

namannandan commented Feb 28, 2024 • edited Loading

Description

Docker container with CPP backend support on CPU

Docker container with CPP backend support on GPU

Type of change

Feature/Issue validation/testing

lxning left a comment

Choose a reason for hiding this comment

agunapal commented Feb 29, 2024

chauhang commented Mar 7, 2024

agunapal left a comment

Choose a reason for hiding this comment

mreso left a comment

Choose a reason for hiding this comment

namannandan commented Mar 11, 2024

mreso left a comment

Choose a reason for hiding this comment

lxning Mar 11, 2024

Choose a reason for hiding this comment

lxning Mar 11, 2024

Choose a reason for hiding this comment

lxning Mar 11, 2024

Choose a reason for hiding this comment

lxning Mar 11, 2024

Choose a reason for hiding this comment

namannandan Mar 11, 2024

Choose a reason for hiding this comment

lxning Mar 11, 2024

Choose a reason for hiding this comment

namannandan Mar 12, 2024

Choose a reason for hiding this comment

namannandan commented Feb 28, 2024 •

edited

Loading