• Recent
    • Docs
    • Github
    • 百度网盘
    • Onedrive
    • Official
    • Shop
    • Register
    • Login

    OpenCL driver works on CM5/4B

    Scheduled Pinned Locked Moved Pi CM5
    1 Posts 1 Posters 356 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      george
      last edited by george

      • Install package
      sudo apt-get update
      sudo apt install opencl-headers 
      sudo apt install ocl-icd-libopencl1 
      sudo apt install ocl-icd-opencl-dev
      sudo apt install clinfo
      
      • Checkout clpeak
        sudo apt-get update && sudo apt-get install cmake git g++
        git clone https://github.com/krrishnarraj/clpeak
        mkdir clpeak/build
        cd clpeak/build
        cmake ..
        make -j$(nproc)
        ./clpeak
      
      • Test result
      coolpi@Ubuntu:~/share/clpeak/build$ ./clpeak 
      arm_release_ver: g13p0-01eac0, rk_so_ver: 3
      
      Platform: ARM Platform
        Device: Mali-G610 r0p0
          Driver version  : 3.0 (Linux ARM64)
          Compute units   : 4
          Clock frequency : 1000 MHz
      
          Global memory bandwidth (GBPS)
            float   : 22.23
            float2  : 23.83
            float4  : 24.41
            float8  : 19.66
            float16 : 11.79
      
          Single-precision compute (GFLOPS)
            float   : 447.13
            float2  : 476.05
            float4  : 471.84
            float8  : 440.87
            float16 : 415.77
      
          Half-precision compute (GFLOPS)
            half   : 447.20
            half2  : 888.10
            half4  : 922.15
            half8  : 897.12
            half16 : 857.05
      
          No double precision support! Skipped
      
          Integer compute (GIOPS)
            int   : 126.60
            int2  : 127.16
            int4  : 126.57
            int8  : 125.25
            int16 : 125.70
      
          Integer compute Fast 24bit (GIOPS)
            int   : 126.62
            int2  : 127.18
            int4  : 126.67
            int8  : 125.28
            int16 : 125.77
      
          Transfer bandwidth (GBPS)
            enqueueWriteBuffer              : 7.76
            enqueueReadBuffer               : 8.84
            enqueueWriteBuffer non-blocking : 7.79
            enqueueReadBuffer non-blocking  : 8.87
            enqueueMapBuffer(for read)      : 63.02
              memcpy from mapped ptr        : 10.38
            enqueueUnmap(after write)       : 63.96
              memcpy to mapped ptr          : 10.39
      
          Kernel launch latency : 19.68 us
      
      
      • Test source code
      #include <stdio.h>  
      #include <stdlib.h>  
      #include <string.h>  
       
      #ifdef MAC  
      #include <OpenCL/cl.h>  
      #else  
      #include <CL/cl.h>  
      #endif  
       
      int main() {  
       
          /* Host data structures */  
          cl_platform_id *platforms;  
          //每一个cl_platform_id 结构表示一个在主机上的OpenCL执行平台,就是指电脑中支持OpenCL的硬件,如nvidia显卡,intel CPU和显卡,AMD显卡和CPU等  
          cl_uint num_platforms;  
          cl_int i, err, platform_index = -1;  
       
          /* Extension data */  
          char* ext_data;                           
          size_t ext_size;     
          const char icd_ext[] = "cl_khr_icd";  
       
          //要使platform工作,需要两个步骤。1 需要为cl_platform_id结构分配内存空间。2 需要调用clGetPlatformIDs初始化这些数据结构。一般还需要步骤0:询问主机上有多少platforms  
       
          /* Find number of platforms */  
          //返回值如果为-1就说明调用函数失败,如果为0标明成功  
          //第二个参数为NULL代表要咨询主机上有多少个platform,并使用num_platforms取得实际flatform数量。  
          //第一个参数为1,代表我们需要取最多1个platform。可以改为任意大如:INT_MAX整数最大值。但是据说0,否则会报错,实际测试好像不会报错。下面是步骤0:询问主机有多少platforms  
          err = clGetPlatformIDs(5, NULL, &num_platforms);          
          if(err < 0) {          
              perror("Couldn't find any platforms.");           
              exit(1);                              
          }                                     
       
          printf("I have platforms: %d\n", num_platforms); //本人计算机上显示为2,有intel和nvidia两个平台  
       
          /* Access all installed platforms */  
          //步骤1 创建cl_platform_id,并分配空间  
          platforms = (cl_platform_id*)                     
              malloc(sizeof(cl_platform_id) * num_platforms);   
          //步骤2 第二个参数用指针platforms存储platform  
          clGetPlatformIDs(num_platforms, platforms, NULL);         
       
          /* Find extensions of all platforms */  
          //获取额外的平台信息。上面已经取得了平台id了,那么就可以进一步获取更加详细的信息了。  
          //一个for循环获取所有的主机上的platforms信息  
          for(i=0; i<num_platforms; i++)   
          {  
              /* Find size of extension data */  
              //也是和前面一样,先设置第三和第四个参数为0和NULL,然后就可以用第五个参数ext_size获取额外信息的长度了。  
              err = clGetPlatformInfo(platforms[i],             
                  CL_PLATFORM_EXTENSIONS, 0, NULL, &ext_size);      
              if(err < 0)   
              {  
                  perror("Couldn't read extension data.");              
                  exit(1);  
              }     
       
              printf("The size of extension data is: %d\n", (int)ext_size);//我的计算机显示224.  
       
              /* Access extension data */    
              //这里的ext_data相当于一个缓存,存储相关信息。  
              ext_data = (char*)malloc(ext_size);   
              //这个函数就是获取相关信息的函数,第二个参数指明了需要什么样的信息,如这里的CL_PLATFORM_EXTENSIONS表示是opencl支持的扩展功能信息。我计算机输出一大串,机器比较新(专门为了学图形学而购置的电脑),支持的东西比较多。  
              clGetPlatformInfo(platforms[i], CL_PLATFORM_EXTENSIONS,       
                  ext_size, ext_data, NULL);                
              printf("Platform %d supports extensions: %s\n", i, ext_data);  
       
              //这里是输出生产商的名字,比如我显卡信息是:NVIDIA CUDA  
              char *name = (char*)malloc(ext_size);  
              clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME,     
                  ext_size, name, NULL);                
              printf("Platform %d name: %s\n", i, name);  
       
              //这里是供应商信息,我显卡信息:NVIDIA Corporation  
              char *vendor = (char*)malloc(ext_size);  
              clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR,       
                  ext_size, vendor, NULL);                  
              printf("Platform %d vendor: %s\n", i, vendor);  
       
              //最高支持的OpenCL版本,本机显示:OpenCL1.1 CUDA 4.2.1  
              char *version = (char*)malloc(ext_size);  
              clGetPlatformInfo(platforms[i], CL_PLATFORM_VERSION,      
                  ext_size, version, NULL);                 
              printf("Platform %d version: %s\n", i, version);  
       
              //这个只有两个值:full profile 和 embeded profile  
              char *profile = (char*)malloc(ext_size);  
              clGetPlatformInfo(platforms[i], CL_PLATFORM_PROFILE,      
                  ext_size, profile, NULL);                 
              printf("Platform %d full profile or embeded profile?: %s\n", i, profile);  
       
              /* Look for ICD extension */     
              //如果支持ICD这一扩展功能的platform,输出显示,本机的Intel和Nvidia都支持这一扩展功能  
              if(strstr(ext_data, icd_ext) != NULL)   
                  platform_index = i;  
              //std::cout<<"Platform_index = "<<platform_index<<std::endl;  
              printf("Platform_index is: %d\n", platform_index);  
              /* Display whether ICD extension is supported */  
              if(platform_index > -1)  
                  printf("Platform %d supports the %s extension.\n",   
                  platform_index, icd_ext);  
       
       
              //释放空间  
              free(ext_data);  
              free(name);  
              free(vendor);  
              free(version);  
              free(profile);  
          }  
       
          if(platform_index <= -1)  
              printf("No platforms support the %s extension.\n", icd_ext);  
       
          /* Deallocate resources */  
          free(platforms);  
          return 0;  
      }   
      
      gcc opencl_hello.c -o opencl_hello -lOpenCL
      
      coolpi@Ubuntu:~/share$ ./opencl_hello
      I have platforms: 1
      arm_release_ver: g13p0-01eac0, rk_so_ver: 3
      The size of extension data is: 1364
      Platform 0 supports extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp16 cl_khr_icd cl_khr_egl_image cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_subgroups cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_khr_subgroup_rotate cl_khr_il_program cl_khr_priority_hints cl_khr_create_command_queue cl_khr_spirv_no_integer_wrap_decoration cl_khr_extended_versioning cl_khr_device_uuid cl_khr_suggested_local_work_size cl_khr_extended_bit_ops cl_khr_integer_dot_product cl_khr_semaphore cl_khr_external_semaphore cl_khr_external_semaphore_sync_fd cl_khr_command_buffer cl_arm_core_id cl_arm_printf cl_arm_non_uniform_work_group_size cl_arm_import_memory cl_arm_import_memory_dma_buf cl_arm_import_memory_host cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_arm_scheduling_controls cl_arm_controlled_kernel_termination cl_ext_cxx_for_opencl cl_ext_image_tiling_control cl_ext_image_requirements_info cl_ext_image_from_buffer
      Platform 0 name: ARM Platform
      Platform 0 vendor: ARM
      Platform 0 version: OpenCL 3.0 v1.g13p0-01eac0.a8b6f0c7e1f83c654c60d1775112dbe4
      Platform 0 full profile or embeded profile?: FULL_PROFILE
      Platform_index is: 0
      Platform 0 supports the cl_khr_icd extension.
      
      1 Reply Last reply Reply Quote 0
      • 1 / 1
      • First post
        Last post