Tuesday, June 12, 2018

Case Study: How to test AMD MI25



AMD MI25


1. Install Ubuntu 16.04.4 Desktop
2. Upgrade Ubuntu 16.04.4 Desktop
a. sudo apt update
b. sudo apt dist-upgrade
c. sudo apt install libnuma-dev
d. sudo apt install linux-headers-4.13.0-32-generic linux-image-4.13.0-32-generic linux-image-extra-4.13.0-32-generic linux-signed-image-4.13.0-32-generic
e. sudo reboot



3. Install test ROCM
a. wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
b. sudo sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list'
c. sudo apt-get update
d. sudo apt-get install rocm-dkms
e. sudo usermod -a -G video $LOGNAME

4. Install MIOpen building tools
a. sudo apt-get install git cmake clinfo

b. clone and build rocm-cmake
git clone https://github.com/RadeonOpenCompute/rocm-cmake
cd rocm-cmake
mkdir build
cd build
cmake ..
sudo cmake --build . --target install

c. clone and build clang-ocl
cd ../.. (return to home from step b)
git clone https://github.com/RadeonOpenCompute/clang-ocl
cd clang-ocl
mkdir build
cd build
cmake ..
sudo cmake --build . --target install

d. clone and build miopengemm
cd ../.. (return to home from step c)
git clone https://github.com/ROCmSoftwarePlatform/MIOpenGEMM
cd MIOpenGEMM
mkdir build
cd build
cmake ..
make miopengemm
sudo make install

e. clone and build miopen based on hip
cd ../.. (return to home from step d)
sudo apt-get install libssl-dev libboost-dev libboost-system-dev libboost-filesystem-dev

download half.hpp from http://half.sourceforge.net/ and copy to /opt/rocm/include/
cp half.hpp /opt/rocm/include

git clone https://github.com/ROCmSoftwarePlatform/MIOpen.git
cd MIOpen
mkdir build
cd build
CXX=/opt/rocm/hcc/bin/hcc cmake -DMIOPEN_BACKEND=HIP -DCMAKE_PREFIX_PATH="/opt/rocm/hcc;/opt/rocm/hip" -DMIOPEN_MAKE_BOOST_PUBLIC=ON -DCMAKE_CXX_FLAGS="-isystem /usr/include/x86_64-linux-gnu/" ..
sudo make install -j32
make MIOpenDriver -j32
reboot

5. test MI25 power load
Before running the benchmark, use Ctrl+ALT+T open a new terminal, type ‘watch /opt/rocm/bin/rocm-smi’ + Enter, it will show real-time MI25 status:
Now run the following deepbench convolution commands, if the power has problem, it will cause MI25 driver lost (benchmark stuck), rocm-smi real-time status in the other terminal will show “Killed”

cd MIOpen/build
./bin/MIOpenDriver conv -t 1 -V 0 -F 0 -s 0 -W 112 -H 112 -c 64 -n 16 -k 128 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1
./bin/MIOpenDriver conv -t 1 -V 0 -F 0 -s 0 -W 56 -H 56 -c 128 -n 16 -k 256 -y 3 -x 3 -p 1 -q 1 -u 1 -v 1
./bin/MIOpenDriver conv -t 1 -V 0 -F 0 -s 0 -W 350 -H 80 -c 64 -n 16 -k 128 -y 5 -x 5 -p 1 -q 1 -u 2 -v 2



Test Full load


6. For hipCaffe:
Install hip-Caffe as follows:
cd ../.. (return to home from step e)

sudo apt-get install -y rocm-libs cxlactivitylogger pkg-config protobuf-compiler libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev libatlas-base-dev libboost-all-dev libgflags-dev libgoogle-glog-dev liblmdb-dev python-numpy python-scipy python3-dev python-yaml python-pip python-skimage python-opencv python-protobuf libopencv-dev libfftw3-dev libelf-dev

git clone -b hip https://github.com/ROCmSoftwarePlatform/hipCaffe.git
cd hipCaffe
cp ./Makefile.config.example ./Makefile.config
export PATH=/opt/rocm/bin:$PATH
make -j$(nproc)

mkdir ./models/bvlc_vgg
cd ./models/bvlc_vgg
wget -O soumith_benchmark.prototxt https://raw.githubusercontent.com/soumith/convnet-benchmarks/master/caffe/imagenet_winners/vgg_a.prototxt
cd ../..

7. Before running the benchmark, use Ctrl+ALT+T open a new terminal, type ‘watch /opt/rocm/bin/rocm-smi’ + Enter, it will show real-time MI25 status:

./build/tools/caffe time -model $AbRoot/hipCaffe/models/bvlc_vgg/soumith_benchmark.prototxt -gpu 0

(this is the vgg caffe, it will stuck and driver lost during run if the power has issue)

Ex.
./build/tools/caffe time -model ./models/bvlc_vgg/soumith_benchmark.prototxt -gpu 0
Or
./build/tools/caffe time -model ./models/bvlc_vgg/soumith_benchmark.prototxt -gpu 0 -iterations 500 (default: 50)



8. Test process
a. Login as root
b. Open terminal
Watch /opt/rocm/bin/rocm-smi
c. Open another terminal
d. cd hipCaffe
./build/tools/caffe time -model ./models/bvlc_vgg/soumith_benchmark.prototxt -gpu 0 -iterations 50 (about 10 seconds)
e. If need to test second GPU
f. Open another terminal
g. cd hipCaffe
./build/tools/caffe time -model ./models/bvlc_vgg/soumith_benchmark.prototxt -gpu 1 -iterations 50 (about 10 seconds)



No comments:

Post a Comment

How to fix gpu_burn compiler failure issue

System Environment: Ubuntu 22.04 LTS Server CUDA v12.0 GPU: RTX-4080 (driver 525.85.05) AP: GPU_Burn v1.1 Symptom: met error in make gpu_bur...