Thursday, September 10, 2020

How to setup B7119 to running with 10x Nvidia Tesla V100 32G GPU cards

Tyan FA77-B7119 can install 10x GPU cards, but if you install 10x Nvidia Tesla V100 32G GPU cards. The FA77-B7119 will hang code at "B6".


Hardware Configuration

FA77-B7119, BIOS v2.02.B10, BMC v7.0

CPU: Intel Xeon 8268 x2

RAM: DDR4-2666 64GB x24

OS: Ubuntu 18.04.4 Server LTS

GPU: Nvidia Tesla V100 32G x10


It needs to remove all Nvidia Tesla V100 32G GPU cards and change BIOS setting as below.
1. Boot FA77-B7119 to BIOS.
2. Change Socket Configuration => Common RefCode Configuration => MMCFG Base parameter from "2G" to "3G".

3. Change Socket Configuration => Common RefCode Configuration => MMIO High Granularity Size parameter from "256G" to "1024G".

4. Save and Reboot
5. DC-OFF, AC-OFF FA77-B7119.
6. Reinstall Nvidia Tesla V100 32G GPU cards.
7. AC-ON, DC-ON FA77-B7119.
8. Boot up FA77-B7119 to OS.


No comments:

Post a Comment

How to fix gpu_burn compiler failure issue

System Environment: Ubuntu 22.04 LTS Server CUDA v12.0 GPU: RTX-4080 (driver 525.85.05) AP: GPU_Burn v1.1 Symptom: met error in make gpu_bur...