Caution
This release is an early-access software technology preview. Running production workloads is not recommended.
hipThreads is a C++-style concurrency library for AMD GPUs that brings familiar threading abstractions to GPU programming by implementing C++ threading and synchronization primitives for GPU code.
The library offers a compatible interface to the C++ Standard Library threading facilities, you can write familiar concurrency code using hip::thread, hip::mutex, hip::lock_guard, hip::condition_variable,
and other primitives. The library supports cooperative threading, standard synchronization primitives, and multi-fiber execution (width parameter) to leverage GPU SIMD architecture.
If you have existing CPU code using std::thread, porting to GPU with hipThreads requires minimal changes:
- Replace
std::threadwithhip::thread - Add
__device__annotation to lambdas/functions running on GPU - Handle GPU memory allocation (CPU and GPU have separate memory pools)
The familiar threading model remains the same, making GPU acceleration accessible without rewriting your concurrency logic. See the examples/ directory for detailed porting examples.
hipThreads requires the following:
- Linux OS (Ubuntu 24.04 recommended)
- CMake 3.21+
- Build tools (e.g.,
makeorninja) - ROCm 7.12+ — hipThreads depends on HIP and libhipcxx. The code samples also use rocThrust utilities. All are included in TheRock builds.
Note
ROCm 7.12 is part of a technology preview release stream (starting from 7.9.0) and is separate from the 7.0–7.2 production releases. The last supported ROCm 7 production release is 7.0.2. For ROCm 7.0.2 setup instructions, see the 0.1.0 release prerequisites.
-
Follow the ROCm 7.12 installation guide for your GPU and distribution. Install at least the core-dev package for your GPU architecture (e.g.,
amdrocm-core-dev7.12-gfx120x). The full core-sdk package (e.g.,amdrocm-core-sdk-gfx120x) also works. -
Configure your environment:
export ROCM_PATH=/opt/rocm/core export PATH=$PATH:$ROCM_PATH/bin export LD_LIBRARY_PATH=$ROCM_PATH/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
To make this persistent across sessions, add the lines above to
~/.bashrcand runsource ~/.bashrc. -
Verify the installation:
hipcc --version # Should print the clang/HIP version rocminfo # Should list detected GPUs and HSA agents amd-smi version # Should show AMDSMI and ROCm version info
By default, hipThreads installs under $ROCM_PATH (matching other ROCm components). You can override this by adding -DCMAKE_INSTALL_PREFIX=<path> to the CMake configure command.
git clone https://github.com/ROCm/hipThreads.git
cd hipThreads
cmake -B build
cmake --build ./build
sudo cmake --install ./buildNote
Installing to $ROCM_PATH usually requires sudo.
To use hipThreads in your CMake project, add the following to your CMakeLists.txt:
find_package(hipthreads REQUIRED)
# ...
target_link_libraries(<your_target> hipthreads::hipthreads)If hipThreads is not installed under $ROCM_PATH, add -DCMAKE_PREFIX_PATH=/path/to/hipthreads to your CMake configure command.
Sample code demonstrating hipThreads usage can be found in the examples/ directory.
The SAXPY example shows how to incrementally port CPU-parallel algorithms to GPU using hip::thread, demonstrating the natural progression from std::thread to optimized GPU execution.
To build and run:
cd examples/saxpy/step3-simdize
cmake -B build
cmake --build ./build
./build/bin/saxpyThe examples/llama3.c directory contains a port of llama3.c (a minimal LLaMA 3 inference engine in C) to hip::thread. See the llama3.c README for full details on model setup and options.
To build:
cd examples/llama3.c/step4-simdize
cmake -B build
cmake --build ./buildDownload and export a model (requires the Meta LLaMA 3 weights):
python export.py llama3.2_3b_instruct_fp32.bin --meta-llama ../llama3.2-3b-instruct/Run inference or start a chat session:
./build/bin/llama3 ~/models/llama3.2_3b_instruct_fp32.bin -z ~/models/tokenizer.bin -i "My car" -n 100
./build/bin/llama3 ~/models/llama3.2_3b_instruct_fp32.bin -z ~/models/tokenizer.bin -m chatCommand-line options
| Option | Description | Default |
|---|---|---|
-t <float> |
Temperature (0 to inf) | 1.0 |
-p <float> |
Top-p sampling (0 to 1) | 0.9 |
-s <int> |
Random seed | time(NULL) |
-n <int> |
Number of steps | 4096 |
-i <string> |
Input prompt | — |
-z <string> |
Path to tokenizer | — |
-m <string> |
Mode: generate or chat |
generate |
-y <string> |
System prompt (chat mode) | — |
Documentation is available in multiple forms:
- API Reference: Doxygen-generated documentation in the
docs/directory - Source Documentation: Since the library uses Doxygen-style comments throughout the source files.
You can browse the
inc/andsrc/directories directly to read the API documentation inline with the code. - Tutorials and Examples: See our ROCm™ Blogs post for an introduction with detailed examples
The API reference is generated with Doxygen. Since we're unable to host it online currently, you can view it locally using any of these methods:
Option 1: Open .html files directly in browser
Option 2: Local HTTP server (recommended for full functionality):
python3 -m http.server 5500 --directory docs/doxygen/html
# Then open http://localhost:5500 in your browserNote
Opening HTML files directly works for most documentation browsing. Use the HTTP server method if you encounter issues with search functionality or cross-file navigation.
While hipThreads mimics the C++ standard library, GPU hardware constraints impose specific rules:
The creation of a hip::thread launches a persistent kernel (scheduler) that polls for work. Consequently, calling synchronous HIP functions (like hipDeviceSynchronize, synchronous hipMemcpy, or thrust::copy) will cause deadlocks because they wait for all GPU tasks to finish—including the persistent idle kernel.
- Solution A (Async APIs): Use async HIP functions (e.g.,
hipMemcpyAsync,hipMemsetAsync) which do not wait for the idle loop to terminate. - Solution B (Scoping): If you must use synchronous calls (e.g., when mixing with rocThrust), wrap your
hip::threadobjects in a scoped block ({ ... }). This ensures threads are joined and the persistent kernel is destroyed before the synchronous call is made.
A hip::thread constructed on the host cannot accept standard host function pointers or standard lambdas.
- Requirement: You must use extended lambdas annotated with
__device__. - Device Functions: Host code cannot reference
__device__functions directly. To call a device function, wrap it inside a[] __device__ { ... }lambda.
Arguments passed to the hip::thread constructor must be TriviallyCopyable as they are copied by value to the device.
- No Complex Types: Do not pass structures containing
std::vectoror other standard containers. - Raw Pointers: If passing a pointer, it must point to GPU-accessible memory (allocated via
hipMallocor similar). Passing a host pointer will cause a crash. - Stack Isolation: GPU threads have private stacks.
Never capture by reference (
[&]) if the variable resides on the launching thread's stack. Other threads cannot access that memory. Shared data must exist in heap/global memory.
GPU synchronization primitives are approximations of their CPU counterparts:
- No Preemption or Blocking: The GPU does not support blocking or hardware preemption.
condition_variable::waitspins or yields rather than blocking. - Pseudo Yield:
this_thread::yieldonly returns control to the caller when the yieldee has finished. The yieldee will not be interrupted and cannot yield back to the caller.
Note
For practical demonstrations of these concepts, explore the examples/ directory.
For detailed usage of specific primitives and implementation of edge cases, refer to the unit tests in the test/ directory.
hipThreads is distributed under the Apache License v2.0 with LLVM Exceptions. See LICENSE.txt for details.
Third-party content is licensed to you directly by the third party that owns the content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED "AS IS" WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.