A non-exhaustive list of things I think you should know as a GPGPU programmer. Optional things are things that aren’t critical for most of the time but being aware of the existence can be very beneficial.
If you want to go through this list, I recommend implementing an algorithm on CPU-naively, optimizing it, and then implementing it on the GPU. Every time you write a line of code, think: “but how does the CPU/GPU execute this?” and try to answer it.
For optimization goals, try to implement an algorithm and compare the performance with an existing library such as rocPRIM or CUB.
Checklist
Focus on fun things, don’t force yourself to know everything by heart. If you understand the concepts and are aware of its existence, you’re already pretty far!
Knowledge
- CPU architecture is still important to know. It’s important to know how CPUs and GPUs differ and therefore what works and what doesn’t work.
Checklist:
- Memory model (caches, cache lines, locality)
- Execution model
- SIMD: what is SSE/AVX
- Von Neumann (optional)
- GPU architecture
Checklist:
- Memory model (caches, locality, cohesion)
- Execution model: SIMT, warps/wavefronts
- Memory banks and bank conflicts (optional)
- Complexity (Big O-notation)
Checklist:
- How to read complexity (e.g.
O(n log n)
) - Calculating complexity
- Why complexity isn’t everything
- How to read complexity (e.g.
- Data Oriented Design vs. Object Oriented Programming
Checklist:
- How this ties in with CPU memory model
- Polymorphism and virtual tables (optional)
- Sequential Algorithms
Checklist:
- Divide and conquer (quick sort, merge sort)
- Optimization techniques (cache alignment, access patterns)
- Parallel Algorithms
Checklist:
- Divide and conquer (reduce, merge sort)
- GEMM
- Bitonic sorter (optional)
Programming languages
- C++ is used as the host language to launch our kernels. CUDA and HIP are also built on-top of C++ so a lot of skills are transferrable.
Checklist:
- Pointers and references
- Templates
- Meta programming (optional)
- SFINAE (optional)
- Virtual Tables (optional)
- r/l/x-values (optional)
- CUDA/HIP. CUDA is industry standard and HIP (AMD’s version) is basically the same minus a few features.
Checklist:
- Launching kernels
- Kernel parameters: i.e. grid size, block size
- Shared memory
- Barriers and fences
- Intrinsics such as broadcast, shuffle (optional)
- OpenCL has less cool things than CUDA, but if you prefer this over that feel free to pick this up. (optional). Checklist: see CUDA/HIP.
Algorithms
- Divide and Conquer.
- Parallel Scan/Reduce.
- Dynamic Programming. (optional)
Skills
- Debugging code can be difficult. Be the
gdb
-chad versus theprintf
-soy. Checklist:-
rocgdb
orcuda-gdb
(optional)
-
- Source Control is almost always done through
git
. You really should already know the basics. Checklist:-
git
-basics: branching, merging, commits - Resolving merge conflicts (optional)
- Rebasing (optional)
- Bisecting *(optional)
-
- CMake is industry standard for building large libraries and projects.
Checklist:
- Compiling from source with CMake
- Using build flags
- Using user presets (optional, recommended)
- Writing CMake scripts (optional)