Compute kernel

In computing, a compute kernel is a routine compiled for high throughput accelerators (such as GPUs, DSPs or FPGAs), separate from (but used by) a main program. They are sometimes called compute shaders, sharing execution units with vertex shaders and pixel shaders on GPUs, but are not limited to execution on one class of device, or graphics APIs.^[1]^[2]

Description

Compute kernels roughly correspond to inner loops when implementing algorithms in traditional languages (except there is no implied sequential operation), or to code passed to internal iterators.

They may be specified by a separate programming language such as "OpenCL C" (managed by the OpenCL API), as "compute shaders" (managed by a graphics API such as OpenGL), or embedded directly in application code written in a high level language, as in the case of C++AMP.

Vector processing

This programming paradigm maps well to vector processors: there is an assumption that each invocation of a kernel within a batch is independent, allowing for data parallel execution. However, atomic operations may sometimes be used for synchronisation between elements (for interdependent work), in some scenarios. Individual invocations are given indices (in 1 or more dimensions) from which arbitrary addressing of buffer data may be performed (including scatter gather operations), so long as the non-overlapping assumption is respected.

Vulkan API

The Vulkan API provides the intermediate SPIR-V representation to describe both Graphical Shaders, and Compute Kernels, in a language independent and machine independent manner. The intention is to facilitate language evolution and provide a more natural ability to leverage GPU compute capabilities, in line with hardware developments such as Unified Memory Architecture and Heterogeneous System Architecture. This allows closer cooperation between a CPU and GPU.

References

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Introduction to Compute Programming in Metal

[2] CUDA Tutorial - the Kernel

Graphics processing unit
GPU	Adreno Apple GeForce Quadro InfiniteReality Intel GT Mali PowerVR Radeon Pro Voodoo
Architecture	Compute kernel Graphics pipeline Geometry Vertex High-dynamic-range rendering Multiply–accumulate operation Rasterisation Ray-tracing SIMD SIMT Tessellation Tiled rendering Transform, clipping, and lighting Unified shader model
Components	Shader unit Texture mapping unit Render output unit Tensor unit Input–output memory management unit Stream processor Geometry processor Video display controller Video processing unit
Memory	Direct memory access Framebuffer GDDR SDRAM GDDR3 GDDR4 GDDR5 GDDR6 High Bandwidth Memory Memory bandwidth Memory controller Shared graphics memory
Form factor	IP core Discrete graphics Clustering Switching External graphics Integrated graphics System on a chip
Performance	Clock rate Display resolution Fillrate Pixel/s Texel/s FLOP/s Frame rate Performance per watt Transistor count
Misc	ASIC GPGPU Graphics library Hardware acceleration Image processing Parallel computing Vector processor Video codec VLIW

Compute kernel

Description

Vector processing

Vulkan API

See also

References