HPCRS Tutorial – Creating an Environment for Running Workloads
In this tutorial we will walk through the steps to setup an environment for running workloads with the High Performance Computing Reference Stack (HPCRS), using Kubernetes*, MPI*, and the HPCRS image with QE* in a HPC cloud environment.
We have tested these instructions with the following hardware and software configurations. This tutorial uses networking information like IP addresses that reflects our test network. Please update these instructions to reflect your network setup.
Hardware Configuration
2 socket Intel® Xeon® Platinum 8380 Processor (2.3GHz, 40 cores)
We tested with 3 nodes in our cluster:
Kubernetes master
Kubernetes workers (2)
Software Prerequisites
For each of the three nodes in the cluster:
OS: We used CentOS 8 for all three nodes.
Setup ssh access without password. For details refer to https://www.ibm.com/support/pages/configuring-ssh-login-without-password
Install Kubernetes (v1.19.2) and Docker* (v19.03.12). Refer to https://www.tecmint.com/install-a-kubernetes-cluster-on-centos-8/
Install cri-o (v1.19.0). Refer to https://computingforgeeks.com/install-cri-o-container-runtime-on-centos-linux/
Software Configuration
On each node:
Configure Kubernetes to enable IP forwarding
Ensure net.ipv4.ip_forward
is set to “1” in the /etc/sysctl.d/60-k8s.conf
file. You may need to create this file if it does not exist.
#cat /etc/sysctl.d/60-k8s.conf
net.ipv4.ip_forward = 1
Restart systmctl to pick up the change:
#systemctl restart systemd-sysctl
Disable swap
#swapoff -a
Configure the /etc/hosts file for all nodes
#echo "127.0.0.1 localhost <hostname>" >> /etc/hosts
Add the IP address and corresponding hostname to /etc/hosts for all 3 nodes. Replace
<x.x.x.x>
with the IP addresses specific to your setup.
#tail -3 /etc/hosts
<x.x.x.x> node1
<x.x.x.x> node2
<x.x.x.x> node3
Setup Docker and cri-o proxy for all nodes
#cat /etc/systemd/system/docker.service.d/proxy.conf
[Service]
Environment="HTTP_PROXY=<your proxy if needed>"
Environment="HTTPS_PROXY=<your proxy if needed>"
Environment="NO_PROXY=192.168.0.0/16,10.0.0.0/8,localhost,127.0.0.0/8,<your network specific addresses>
#cat /etc/sysconfig/crio
HTTP_PROXY=<your proxy if needed>
HTTPS_PROXY=<your proxy if needed>
NO_PROXY=i192.168.0.0/16,10.0.0.0/8,localhost,127.0.0.0/8,<your network specific addresses>"
Setup the Docker registry on which we host the HPCRS image (sysstacks/hpcrs-centos:latest docker image + QE installed), and from which Kubernetes will deploy the latest image.
#cat /etc/containers/registries.conf
#master will also act as local registry, add below to bypass insecure error
[registries.insecure]
registries = ["<your registry IP address>:5000"]
Configure kubelet for the correct CPU policy
#cat /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd --cpu-manager-policy=static --kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi --system-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi
Restart and enable all related services
#systemctl restart crio
#systemctl restart docker
#systemctl restart kubelet
Configuring the Kubernetes master
Install podman (v1.6.4)
#dnf install podman
Install helm (v3.3.4). Follow the instructions at https://www.techrepublic.com/article/how-to-install-the-kubernetes-package-manager-helm/
Setup the local registry for the cri image
Setup the local registry on the master node, and download the HPCRS stack image:
#docker pull docker.io/registry
#docker run -d -p 5000:5000 --name=registry --restart=always --privileged=true --log-driver=none -v /registery:/tmp/registry registry
#docker pull <your registry address or url>/sysstacks/hpcrs-centos:v0.2.0
Modify the Dockerfile to add QE based on the HPCRS stack image
# vi Dockerfile From <your registry url>/sysstacks/hpcrs-centos:v0.2.0
Add and Build QE
For details on QE refer to: https://hpc-forge.cineca.it/files/gara_tier_1/public/Benchmark-Instructions.txt.
Add the following to your Dockerfile to build the image:
From <x.x.x.x>:5000/hpcrstest:test
#Proxy Settings
ARG proxy=<your proxy if needed>
ENV http_proxy=$proxy
ENV https_proxy=$proxy
#Install ssh and Generate ssh Host Keys
RUN yum update --disablerepo=intel-graphics --disablerepo=oneAPI -y && yum install --disablerepo=intel-graphics --disablerepo=oneAPI -y openssh-server
COPY ssh-entrypoint.sh /bin/ssh-entrypoint.sh
RUN chmod +x /bin/ssh-entrypoint.sh
RUN ssh-entrypoint.sh
#Fix ssh Login Issue
RUN usermod -p ! root
RUN mkdir -p /etc/ssh/
RUN echo "Port 2022" > /etc/ssh/sshd_config
Create the cri image based on HPCRS in the registry
#cd image
#podman build -t hpcrstestcri:cri --build-arg proxy=$http_proxy -f Dockerfile .
#podman push hpcrstestcri:cri --tls-verify=false docker://localhost:5000/hpcrstestcri:cri
Copy the config to the root user directory and install flannel
#mkdir -p $HOME/.kube && cp /etc/kubernetes/admin.conf $HOME/.kube/config
#kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Init the Kubernetes cluster
#kubeadm reset --cri-socket=/run/crio/crio.sock -f
#kubeadm init --apiserver-advertise-address <x.x.x.x> --pod-network-cidr <x.x.x.x>/16 --cri-socket=/run/crio/crio.sock --ignore-preflight-errors=SystemVerification --token-ttl=0
Record the last command in the log, which will be used to join workers to the Kubernetes cluster, such as:
kubeadm init --apiserver-advertise-address <your network address> --pod-network-cidr <your network address>/16 --cri-socket=/run/crio/crio.sock --ignore-preflight-errors=SystemVerification --token-ttl=0
Run the following commands on each worker to join the cluster:
#kubeadm join <your network address>:6443 --token tceoji.65vxsvcv8z5vzce2 --discovery-token-ca-cert-hash sha256:d4b3848ff2d5a242b63037687351805abb7abecc8a260e77905ca61d9d077758
NOTE:
Make sure the time is aligned across worker and master nodes, checking with the Linux
#date
command.
Run QE on the HPCRS image
All commands are run on the Kubernetes master node.
#export NAMESPACE=helm
#kubectl delete namespace ${NAMESPACE}
#kubectl create namespace ${NAMESPACE}
#SECRET=hpc-secret
#mkdir -p .tmp
#yes | ssh-keygen -N "" -f .tmp/id_rsa -C ""
#kubectl delete secret ${SECRET} -n ${NAMESPACE} || true
#kubectl create secret generic ${SECRET} -n ${NAMESPACE} --from-file=id_rsa=.tmp/id_rsa --from-file=id_rsa.pub=.tmp/id_rsa.pub --from-file=authorized_keys=.tmp/id_rsa.pub
#kubectl config set-context --current --namespace=$NAMESPACE
#helm template --values ./values.yaml hpc ./horovod > debug.yml
#kubectl apply -f debug.yml --validate=false
QE will be triggered and you can check the logs through these commands:
#kubectl get pods
[root@a4bf0157a8d7 k8s]# kubectl get pods
NAME READY STATUS RESTARTS AGE
hpc-horovod-0 1/1 Running 0 2m8s
hpc-horovod-zzxdx 1/1 Running 2 2m8s
Check the logs of a specific pod with this command:
#kubectl logs -f hpc-horovod-zzxdx
+ sleep 5
+ mkdir -p /root/.ssh
+ rm -f /root/.ssh/config
+ touch /root/.ssh/config
+ '[' true == true ']'
+ set +e
+ yes
+ cp /etc/secret-volume/id_rsa /root/.ssh/id_rsa
+ yes
+ cp /etc/secret-volume/authorized_keys /root/.ssh/authorized_keys
+ set -e
+ '[' -n 32222 ']'
+ echo 'Port 32222'
+ sed -i 's/^Port.*/Port 32222 /g' /etc/ssh/sshd_config
+ echo 'StrictHostKeyChecking no'
+ /usr/sbin/sshd
+ runCheckSSH /horovod/generated/hostfile
++ cat /horovod/generated/hostfile
++ awk '{print $(1)}'
+ for i in `cat $1 | awk '{print $(1)}'`
+ [[ <x.x.x.x> != *\m\a\s\t\e\r ]]
+ retry 30 ssh -o ConnectTimeout=2 -q <x.x.x.x> exit
+ local n=0
+ local try=30
+ local 'cmd=ssh -o ConnectTimeout=2 -q <x.x.x.x> exit'
+ [[ 7 -le 1 ]]
+ set +e
+ [[ 0 -ge 30 ]]
+ ssh -o ConnectTimeout=2 -q <x.x.x.x> exit
+ break
+ ssh -o ConnectTimeout=2 -q <x.x.x.x> exit
+ '[' 0 -ne 0 ']'
+ set -e
+ for i in `cat $1 | awk '{print $(1)}'`
+ [[ 10.165.9.31 != *\m\a\s\t\e\r ]]
+ retry 30 ssh -o ConnectTimeout=2 -q <x.x.x.x> exit
+ local n=0
+ local try=30
+ local 'cmd=ssh -o ConnectTimeout=2 -q <x.x.x.x> exit'
+ [[ 7 -le 1 ]]
+ set +e
+ [[ 0 -ge 30 ]]
+ ssh -o ConnectTimeout=2 -q <x.x.x.x> exit
+ break
+ ssh -o ConnectTimeout=2 -q <x.x.x.x> exit
+ '[' 0 -ne 0 ']'
+ set -e
+ '[' 0 -ne 0 ']'
+ '[' 1 -eq 0 ']'
+ bash -c 'cd /root/benchmarks/AUSURF112/ && source /opt/intel/bin/compilervars.sh intel64 && mpir 24 --hostfile /horovod/generated/hostfile -genv I_MPI_DEBUG 4 -genv I_MPI_PIN_DOMAIN omp -genv I_h -genv OMP_PROC_BIND CLOSE -genv OMP_PLACES threads -genv OMP_NUM_THREADS 2 /root/q-e/bin/pw.x -iAUSURF112/ausurf.in -npool 2
After running the workload, it will output the performance data, for example:
PWSCF : 1m52.95s CPU 0m32.17s WALL
PyTorch benchmarks
This section describes running the PyTorch benchmarks for Caffe2 in single node.
Pull the image:
docker pull sysstacks/hpcrs-centos
NOTE:
If you are on a network with outbound proxies, be sure to configure Docker to allow access. See the Docker service proxy and Docker client proxy documentation for more details.
Run the image with Docker:
docker run --name <image name> --rm -i -t sysstacks/hpcrs-centos /bin/bash
NOTE:
Launching the Docker image with the
-i
argument starts interactive mode within the container. Enter the following commands in the running container.Navigate to to where Pytorch and Caffe are installed:
cd /usr/lib/python3.8/site-packages/caffe2/python/
Install dependencies for the benchmark:
pip install protobuf pip install future
Execute the benchmark script:
python convnet_benchmarks.py --batch_size 32 \ --cpu \ --model AlexNet
Using DCP++
Data Parallel C++ (DPC++) is a high-level language designed for data parallel programming productivity. A sample file is included to show how to use DPC++ with the HPCRS image.
To test DPC++:
clang++ -I $DPCPP_ROOT/include/sycl test_dpcpp.cpp -L $DPCPP_ROOT/lib -fsycl
./a.out
If running on a CPU, the program will compile and an this output is expected:
test dpcpp platform-1 device-1 ...
Non-GPU device
Using Spack* to list available recipes
Spack is a program manager for supercomputers, Linux and macOS, and is included in the HPCRS image. Using Spack in the HPCRS image is straighforward.
Pull the image:
docker pull sysstacks/hpcrs-centos
NOTE:
If you are on a network with outbound proxies, be sure to configure Docker to allow access. See the Docker service proxy and Docker client proxy documentation for more details.
Run the image with Docker, launching in interactive mode with the
-i
flag:docker run --name <image name> --rm -i -t sysstacks/hpcrs-centos /bin/bash
Run the following at the bash prompt:
# to list available recipes: > spack list # to get info on a specific recipe, use `spack info`: > spack info zlib # to install a recipe: > spack install zlib
HPCRS and the Intel® VTune™ Profiler
Intel® VTune™ Profiler allows you to profile applications running in Docker* containers, including profiling multiple containers simultaneously. More information about VTune Profiler is available at software.intel.com
Prerequisites
This section of the tutorial assumes the following prerequisites are met
Intel VTune Profiler 2020
Linux* container runtime: docker.io
Operating System on host: Ubuntu* or CentOS with Linux kernel version 4.10 or newer
Intel(r) microarchitecture code named Skylake with 8 logical CPUs
Pull the image onto the VTune enabled system:
docker pull sysstacks/hpcrs-centos
Run the container and keep it running with the
-t
and-d
optionsdocker run --name <image name> -td <sysstacks/hpcrs-centos>
Find the container ID with the
docker ps
commandhost> docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 98fec14f0c08 hpcrs_test "/bin/bash" 10 seconds ago Up 9 seconds
Use the container ID to ensure bash is running in the background
docker exec -it 98fec14f0c08 /bin/bash
Use VTune to collect and analyze data
Launch the VTune Profiler on the host, for example:
host> cd /opt/intel/vtune_profiler host> source ./vtune-vars.sh host> vtune-gui
Create a project for your analysis in VTune, for example:
python-benchmark
Run an application within the HPCRS container
For example, run the python benchmarks as shown above
On the Configure Analysis tab in VTune, configure the following options:
On the WHAT pane, select the Profile System target type
Select the Hardware Event-Based Sampling mode
On the HOW pane, enable stack collection
Click Start to run the analysis.
You can also profile Docker containers using the Attach to Process target type, but you will only be able to profile a single container at a time.
For more information on Intel VTune Profiler capabilites, refer to the Intel® VTune™ Profiler Performance Analysis Cookbook