Wednesday, March 1, 2023

How to fix gpu_burn compiler failure issue

System Environment:

Ubuntu 22.04 LTS Server

CUDA v12.0

GPU: RTX-4080 (driver 525.85.05)

AP: GPU_Burn v1.1


Symptom:

met error in make gpu_burn.

fae@intel:~/gpu_burn$ sudo make
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:.:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -arch=compute_30 -ptx compare.cu -o compare.ptx
nvcc fatal   : Value 'compute_30' is not defined for option 'gpu-architecture'
Makefile:10: recipe for target 'drv' failed
make: *** [drv] Error 1


This symptom cause by nvcc --gpu-architecture (-arch) in CUDA v12.0 not support compute_30. So we should edit Makefile as below.


CUDAPATH=/usr/local/cuda

# Have this point to an old enough gcc (for nvcc)
GCCPATH=/usr

NVCC=${CUDAPATH}/bin/nvcc
CCPATH=${GCCPATH}/bin

drv:
    PATH=${PATH}:.:${CCPATH}:${PATH} ${NVCC} -I${CUDAPATH}/include -arch=compute_30 -ptx compare.cu -o compare.ptx
    g++ -O3 -Wno-unused-result -I${CUDAPATH}/include -c gpu_burn-drv.cpp
    g++ -o gpu_burn gpu_burn-drv.o -O3 -lcuda -L${CUDAPATH}/lib64 -L${CUDAPATH}/lib -Wl,-rpath=${CUDAPATH}/lib64 -Wl,-rpath=${CUDAPATH}/lib -lcublas -lcudart -o gpu_burn


Change -arch=compute_50 to compute_90, or refer the Virtual Architecture Feature List change value.








Monday, December 12, 2022

How to set Tyan B8252G79AE12HR-2T support SATA device

 B8252G79AE12HR-2T, BIOS v2.08.B22


Change J12 SATA or NVMe link Select from [NVME] to [SATA]. Then HDD#0~3 can support both SATA and NVMe devices.



Tuesday, June 21, 2022

How to remote login Ubuntu Server via root

Step#1

#vi /etc/ssh/sshd_config

Step#2

change

#PermitRootLogin prohibit-passowrd

to

#PermitRootLogin yes

Step#3

#systemctl restart ssh

Wednesday, April 27, 2022

How to get BMC SEL via Redfish

 Environment


B8036G68V4E4HR-LE, BIOS v2.06.B21, BMC v4.00

CPU: AMD 74F3 x1

RAM: DDR4 128GB

POSTman: v8.11.1


Redfish command

get https://BMC_IP/redfish/v1/Managers/Self/LogServices/SEL/Entries ,

 

Result

 

{

    "@odata.context": "/redfish/v1/$metadata#LogEntryCollection.LogEntryCollection",

    "@odata.etag": "\"1650459871\"",

    "@odata.id": "/redfish/v1/Managers/Self/LogServices/SEL/Entries",

    "@odata.type": "#LogEntryCollection.LogEntryCollection",

    "Description": "Collection of entries for this log service",

    "Members": [

        {

            "@odata.id": "/redfish/v1/Managers/Self/LogServices/SEL/Entries/1",

            "@odata.type": "#LogEntry.v1_4_3.LogEntry",

            "Created": "2000-01-01T00:01:14+00:00",

            "Description": "SEL 1",

            "EntryCode": "Assert",

            "EntryType": "SEL",

            "EventTimestamp": "2000-01-01T00:00:22+00:00",

            "Id": "1",

            "Message": "Event_Data_1 : 7, Record_Type : system event record, Sensor_Number : 215, Event_Dir : Assertion event, Event_Data_2 : 2, Timestamp : 2000-01-01T00:00:22+00:00, EvM_Rev : IPMI v2.0 Event Messages, Generator_ID : 0x2000, Event_Data_3 : 255, Event_Type : Firmware / Software Change Detected was successful, Record_ID : 1, Sensor_Type : Version Change, ",

            "MessageId": "0x0702FF",

            "Name": "SEL 1",

            "SensorNumber": 215,

            "SensorType": "Version Change",

            "Severity": "OK"

……..

           "@odata.id": "/redfish/v1/Managers/Self/LogServices/SEL/Entries/15",

            "@odata.type": "#LogEntry.v1_4_3.LogEntry",

            "Created": "2022-04-20T13:04:31+00:00",

            "Description": "SEL 15",

            "EntryCode": "Assert",

            "EntryType": "SEL",

            "EventTimestamp": "2022-04-20T13:04:30+00:00",

            "Id": "15",

            "Links": {

                "OriginOfCondition": {

                    "@odata.id": "/redfish/v1/Systems/Self"

                }

            },

            "Message": "Event_Data_1 : 0, Record_Type : system event record, Sensor_Number : 211, Event_Dir : Assertion event, Event_Data_2 : 255, Timestamp : 2022-04-20T13:04:30+00:00, EvM_Rev : IPMI v2.0 Event Messages, Generator_ID : 0x2000, Event_Data_3 : 255, Event_Type : S0/G0 'Working, Record_ID : 15, Sensor_Type : System ACPI PowerState, ",

            "MessageId": "0x00FFFF",

            "Name": "SEL 15",

            "SensorNumber": 211,

            "SensorType": "System ACPI PowerState",

            "Severity": "OK"

        }

    ],

    "Members@odata.count": 15,

    "Name": "Log Service Entries Collection"

}


Tuesday, February 8, 2022

How to fix Failed to download metadata for repo in CentOS 8.x

CentOS Linux 8 had reached the End Of Life (EOL) on December 31st, 2021. It means that CentOS 8 will no longer receive development resources from the official CentOS project. After Dec 31st, 2021, if you need to update your CentOS, you need to change the mirrors to vault.centos.org where they will be archived permanently.


Step 1: Go to the /etc/yum.repos.d/ directory.

[root@autocontroller ~]# cd /etc/yum.repos.d/


Step 2: Run the below commands

[root@autocontroller ~]# sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-*

[root@autocontroller ~]# sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-*


Step 3: Now run the yum update

[root@autocontroller ~]# yum update -y

Thursday, January 20, 2022

How to solve nvme name floated after boot with 4.15.0 kernel

 Symptom:

Nvme devie name changed in every reboot. (Ubuntu 18.04, kernel 4.15.0-29)


Solution:

Ubuntu Bug#1792660

A. upgrade Ubuntu kernel to 4.17 or later. 

B. 

1. keep Ubuntu kernel at 4.15, but upgrade kernel to 4.15.0-34.

2. modify kernel parameter "nvme_core.nultipath=0' in /etc/default/grub

ex. GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.nultipath=0"

How to fix gpu_burn compiler failure issue

System Environment: Ubuntu 22.04 LTS Server CUDA v12.0 GPU: RTX-4080 (driver 525.85.05) AP: GPU_Burn v1.1 Symptom: met error in make gpu_bur...