@tmb68 The following figure shows the test results of OpenCl, and the testing method will be provided later.
coolpi@Ubuntu:~/share/clpeak/build$ ./clpeak
Platform: ARM Platform
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
Device: Mali-LODX r0p0
Driver version : 2.1 (Linux ARM64)
Compute units : 4
Clock frequency : 1000 MHz
Global memory bandwidth (GBPS)
float : 21.35
float2 : 23.18
float4 : 24.05
float8 : 12.08
float16 : 11.06
Single-precision compute (GFLOPS)
float : 447.12
float2 : 476.20
float4 : 472.35
float8 : 440.72
float16 : 416.19
Half-precision compute (GFLOPS)
half : 447.25
half2 : 888.31
half4 : 921.77
half8 : 897.50
half16 : 856.49
No double precision support! Skipped
Integer compute (GIOPS)
int : 126.55
int2 : 127.18
int4 : 126.64
int8 : 125.27
int16 : 125.79
Integer compute Fast 24bit (GIOPS)
int : 126.64
int2 : 127.23
int4 : 126.79
int8 : 125.30
int16 : 125.76
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 7.31
enqueueReadBuffer : 8.25
enqueueWriteBuffer non-blocking : 7.30
enqueueReadBuffer non-blocking : 8.26
enqueueMapBuffer(for read) : 61.83
memcpy from mapped ptr : 9.57
enqueueUnmap(after write) : 61.21
memcpy to mapped ptr : 9.40
Kernel launch latency : 19.97 us
