BlueCache

A complete GPU KV-cache offload solution that moves KV tensors from Host GPU memory to BlueField DPU-backed storage tiers without Host CPU involvement.

Overview

This project provides an end-to-end pipeline for offloading GPU-resident data — primarily LLM KV caches — to storage attached to a local BlueField DPU. It is built from three integrated pieces:

blue-cache (blue-cache/) — The DPU-side agent. It runs on the BlueField DPU ARM cores, imports the remote GPU memory map, executes DOCA DMA operations, and writes incoming data to DPU-side storage backends.
NIXL Plugin (nixl-plugin/) — A host-side NIXL backend named BLUE_CACHE. It registers GPU buffers as VRAM_SEG, exports them over PCIe with DOCA DMA, and forwards transfer requests to the DPU agent.
LMCache Integration (examples/lmcache/) — A patch set and configuration example that enables LMCache v0.4.3 to use the BLUE_CACHE backend for transparent KV-cache tiering.

Together these components let an application such as LMCache express a transfer as VRAM_SEG ↔ OBJ_SEG and have the actual PCIe DMA and storage I/O executed by the DPU.

Supported DPU storage targets

The DPU agent can land data in multiple backend types, allowing the same offload path to target different cost/performance tiers:

Target	How it is used	Typical use case
DPU DRAM	Pre-allocated staging buffer; can also serve as a fast transient tier	Low-latency cache spill
DPU-local disk	POSIX files via the agent's `posix_storage_backend`	Capacity tier on BlueField NVMe
Remote / object storage	NIXL `OBJ_SEG` backend (e.g. `xdfs_storage_backend`)	Shared object store, distributed cache

Bulk data always moves over DOCA DMA between Host GPU and DPU. Only small control messages travel over DOCA Comch or TCP.

Why this matters

In LLM serving, the KV cache is large, grows with sequence length, and competes with model weights for limited GPU HBM. Existing offload paths often route data through the Host CPU or across the network, which:

consumes host CPU cycles that could run the inference engine,
adds extra memory copies,
and is hard to integrate cleanly with a tiered cache.

By using the BlueField DPU's dedicated DOCA DMA engine, this solution:

moves data directly between GPU and DPU storage across the PCIe complex,
keeps the host CPU out of the data path,
and exposes the offload path through the standard NIXL API so applications like LMCache do not need to know DOCA details.

Architecture

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│  Host                                                                                   │
│  ┌─────────────────────┐    ┌─────────────────────────────┐                             │
│  │ LMCache / vLLM      │    │ NIXL Agent                  │                             │
│  │ (KV-cache manager)  │───►│ + BLUE_CACHE backend        │                             │
│  └─────────────────────┘    │   - registers GPU VRAM      │                             │
│                             │   - exports GPU mmap        │                             │
│                             │   - sends transfer requests │                             │
│                             └─────────────┬───────────────┘                             │
│                                           │                                             │
│                              Control plane│(DOCA Comch / TCP)                          │
│                                           ▼                                             │
│  ┌─────────────────────┐    ┌─────────────────────────────┐                             │
│  │ GPU HBM (VRAM_SEG)  │◄──►│ DOCA DMA over PCIe          │                             │
│  └─────────────────────┘    └─────────────────────────────┘                             │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                            │
                                            ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│  BlueField DPU                                                                          │
│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
│  │ blue-cache agent                                                                │    │
│  │  ┌───────────────┐    ┌───────────────┐    ┌─────────────────────────────────┐  │    │
│  │  │ DOCA DMA      │───►│ staging buffer│───►│ NIXL storage backend            │  │    │
│  │  │ engine        │    │ (DPU DRAM)    │    │ (posix / xdfs / xdfs_kv / ...)  │  │    │
│  │  └───────────────┘    └───────────────┘    └─────────────────────────────────┘  │    │
│  │                                              │                                   │    │
│  │                                              ▼                                   │    │
│  │                              ┌──────────────┴──────────────┐                   │    │
│  │                              │                             │                   │    │
│  │                              ▼                             ▼                   │    │
│  │                       ┌─────────────┐            ┌─────────────────┐            │    │
│  │                       │ DPU-local   │            │ Remote Storage  │            │    │
│  │                       │ (posix)     │            │ xdfs / xdfs_kv  │            │    │
│  │                       └─────────────┘            └─────────────────┘            │    │
│  └─────────────────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────────────────┘

blue-cache

The DPU agent is the piece that executes the offload. It runs as a service on the BlueField DPU and is intentionally separate from the NIXL library so it can evolve independently.

Responsibilities:

Import the host GPU mmap from the PCI export descriptor sent by the plugin.
Maintain a reusable DPU-side staging buffer.
Execute chunked, pipelined DOCA DMA with configurable queue depth.
Forward received data to a NIXL storage backend running on the DPU, which in turn writes to local files or object storage.

Build and run instructions are in blue-cache/README.md.

NIXL Plugin

The host plugin implements the NIXL nixlBackendEngine interface. It exposes two memory types:

VRAM_SEG — Host GPU memory, exported via doca_mmap_export_pci().
OBJ_SEG — DPU-resident object/file, identified by a path or key string.

The backend is local-only (supportsRemote() == false): both the GPU and the DPU must be reachable through the same host-side BlueField PCI function.

Because NIXL loads backends dynamically, the plugin source is injected into a NIXL source tree with scripts/patch_nixl.sh and built together with NIXL.

LMCache Integration

examples/lmcache/ contains:

lmcache_integration.patch — modifications to LMCache v0.4.3 to recognize and use the BLUE_CACHE backend.
lmcache-config.yaml — sample configuration.
patch_lmcache.sh — helper that applies the patch idempotently.

After patching LMCache, you can configure a storage backend that points to the DPU agent and offload KV tensors transparently.

Repository Layout

.
├── common/              # Shared host-DPU control channel + wire protocol (dma_transfer.h)
├── nixl-plugin/         # NIXL backend plugin source (patch into NIXL)
├── blue-cache/           # BlueField DPU proxy service
├── examples/
│   ├── cpp/             # NIXL C++ example
│   ├── python/          # NIXL Python example
│   ├── standalone/      # Standalone host test tool (no NIXL required)
│   └── lmcache/         # LMCache v0.4.3 integration patch
├── scripts/             # patch_nixl.sh and build helpers
├── docs/                # Architecture and integration docs
├── CMakeLists.txt
├── LICENSE
└── CONTRIBUTING.md

Quick Start

1. Build blue-cache

On the BlueField DPU:

mkdir -p build && cd build
cmake .. -DBUILD_EXAMPLES=OFF
make -j$(nproc) blue-cache

Run the agent (TCP fallback mode for the easiest first test):

./blue-cache/blue-cache -p 0000:03:00.0 -m 256 -q 4 -b posix -T

Omit -T to use DOCA Comch mode.

2. Patch NIXL with the Plugin

On the host where NIXL is built:

./scripts/patch_nixl.sh /path/to/nixl/source

cd /path/to/nixl/source
meson setup build -Denable_plugins=BLUE_CACHE
ninja -C build

The patch script is idempotent; running it multiple times is safe.

3. Run the Python Example

export NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib/plugins

python3 examples/python/nixl_blue_cache_example.py \
    -o push \
    -p 0000:ba:00.0 \
    -g 0 \
    -f /data/test_obj \
    -s 64 \
    -d 10.75.70.125 \
    -m tcp

See examples/python/README.md for push/pull examples and COMCH-mode usage.

Compatibility

This project has been verified against NIXL v1.1.0. Other NIXL versions may require minor adjustments to scripts/patch_nixl.sh or the plugin source.

Documentation

docs/ARCHITECTURE.md — Host plugin, DPU agent, control plane, and data plane design.
docs/LMCache_INTEGRATION.md — KV-cache offload reference architecture.
blue-cache/README.md — Build, run, and tune the DPU-side agent.
examples/python/README.md — Python end-to-end example.
examples/standalone/ — Standalone host test tool that does not require NIXL.
CONTRIBUTING.md — Build, test, and NIXL upstreaming workflow.

Troubleshooting

NIXL build fails with `fatal error: toml++/toml.hpp: No such file or directory`

NIXL 1.1.0 uses tomlplusplus as a required dependency. When the telemetry plugin is enabled, its doca backend may miss the tomlplusplus include path because nixl_common_dep is not listed in its dependencies.

Recommended fix: patch src/plugins/telemetry/doca/meson.build to add nixl_common_dep:

# In src/plugins/telemetry/doca/meson.build
- dependencies: [nixl_infra, absl_log_dep, doca_dep],
+ dependencies: [nixl_infra, nixl_common_dep, absl_log_dep, doca_dep],

Then rebuild:

cd /path/to/nixl/source
meson setup build --wipe -Denable_plugins=BLUE_CACHE
ninja -C build

This fix mirrors the upstream NIXL commit b98dd59. It keeps telemetry enabled while correctly propagating the required include path.

Fallback: If you do not need telemetry, disable the telemetry plugins entirely:

cd /path/to/nixl/source
sed -i "s/^subdir('telemetry')/# subdir('telemetry')/" src/plugins/meson.build

meson setup build --wipe -Denable_plugins=BLUE_CACHE
ninja -C build

`Could not find nvcc, please set CUDAToolkit_ROOT`

The C++ examples require CUDA Toolkit. On a machine without CUDA, disable examples:

cmake .. -DBUILD_EXAMPLES=OFF
make blue-cache

Or build blue-cache directly from the blue-cache/ directory:

cd blue-cache
./scripts/build_dpu.sh

`BLUE_CACHE` plugin not found at runtime

Set the plugin search path:

export LD_LIBRARY_PATH=/opt/nvidia/nvda_nixl/lib/plugins:$LD_LIBRARY_PATH

Or in Python/C++ code:

agent.add_plugin_directory("/opt/nvidia/nvda_nixl/lib/plugins")

If NIXL was built with -Dstatic_plugins=BLUE_CACHE, the plugin is linked into libnixl.so and no search path is needed.

`doca_dma.h` not found

DOCA SDK is not installed or DOCA_DIR is incorrect:

cmake .. -DDOCA_DIR=/opt/mellanox/doca

Verify that /opt/mellanox/doca/include/doca_dma.h exists.

License

Apache-2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BlueCache

Overview

Supported DPU storage targets

Why this matters

Architecture

blue-cache

NIXL Plugin

LMCache Integration

Repository Layout

Quick Start

1. Build blue-cache

2. Patch NIXL with the Plugin

3. Run the Python Example

Compatibility

Documentation

Troubleshooting

NIXL build fails with `fatal error: toml++/toml.hpp: No such file or directory`

`Could not find nvcc, please set CUDAToolkit_ROOT`

`BLUE_CACHE` plugin not found at runtime

`doca_dma.h` not found

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
blue-cache		blue-cache
common		common
docs		docs
examples		examples
nixl-plugin		nixl-plugin
scripts		scripts
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

BlueCache

Overview

Supported DPU storage targets

Why this matters

Architecture

blue-cache

NIXL Plugin

LMCache Integration

Repository Layout

Quick Start

1. Build blue-cache

2. Patch NIXL with the Plugin

3. Run the Python Example

Compatibility

Documentation

Troubleshooting

NIXL build fails with fatal error: toml++/toml.hpp: No such file or directory

Could not find nvcc, please set CUDAToolkit_ROOT

BLUE_CACHE plugin not found at runtime

doca_dma.h not found

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

NIXL build fails with `fatal error: toml++/toml.hpp: No such file or directory`

`Could not find nvcc, please set CUDAToolkit_ROOT`

`BLUE_CACHE` plugin not found at runtime

`doca_dma.h` not found

Packages