Skip to content

BaizeAI/BlueCache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BlueCache

A complete GPU KV-cache offload solution that moves KV tensors from Host GPU memory to BlueField DPU-backed storage tiers without Host CPU involvement.

Overview

This project provides an end-to-end pipeline for offloading GPU-resident data — primarily LLM KV caches — to storage attached to a local BlueField DPU. It is built from three integrated pieces:

  1. blue-cache (blue-cache/) — The DPU-side agent. It runs on the BlueField DPU ARM cores, imports the remote GPU memory map, executes DOCA DMA operations, and writes incoming data to DPU-side storage backends.
  2. NIXL Plugin (nixl-plugin/) — A host-side NIXL backend named BLUE_CACHE. It registers GPU buffers as VRAM_SEG, exports them over PCIe with DOCA DMA, and forwards transfer requests to the DPU agent.
  3. LMCache Integration (examples/lmcache/) — A patch set and configuration example that enables LMCache v0.4.3 to use the BLUE_CACHE backend for transparent KV-cache tiering.

Together these components let an application such as LMCache express a transfer as VRAM_SEG ↔ OBJ_SEG and have the actual PCIe DMA and storage I/O executed by the DPU.

Supported DPU storage targets

The DPU agent can land data in multiple backend types, allowing the same offload path to target different cost/performance tiers:

Target How it is used Typical use case
DPU DRAM Pre-allocated staging buffer; can also serve as a fast transient tier Low-latency cache spill
DPU-local disk POSIX files via the agent's posix_storage_backend Capacity tier on BlueField NVMe
Remote / object storage NIXL OBJ_SEG backend (e.g. xdfs_storage_backend) Shared object store, distributed cache

Bulk data always moves over DOCA DMA between Host GPU and DPU. Only small control messages travel over DOCA Comch or TCP.

Why this matters

In LLM serving, the KV cache is large, grows with sequence length, and competes with model weights for limited GPU HBM. Existing offload paths often route data through the Host CPU or across the network, which:

  • consumes host CPU cycles that could run the inference engine,
  • adds extra memory copies,
  • and is hard to integrate cleanly with a tiered cache.

By using the BlueField DPU's dedicated DOCA DMA engine, this solution:

  • moves data directly between GPU and DPU storage across the PCIe complex,
  • keeps the host CPU out of the data path,
  • and exposes the offload path through the standard NIXL API so applications like LMCache do not need to know DOCA details.

Architecture

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│  Host                                                                                   │
│  ┌─────────────────────┐    ┌─────────────────────────────┐                             │
│  │ LMCache / vLLM      │    │ NIXL Agent                  │                             │
│  │ (KV-cache manager)  │───►│ + BLUE_CACHE backend        │                             │
│  └─────────────────────┘    │   - registers GPU VRAM      │                             │
│                             │   - exports GPU mmap        │                             │
│                             │   - sends transfer requests │                             │
│                             └─────────────┬───────────────┘                             │
│                                           │                                             │
│                              Control plane│(DOCA Comch / TCP)                          │
│                                           ▼                                             │
│  ┌─────────────────────┐    ┌─────────────────────────────┐                             │
│  │ GPU HBM (VRAM_SEG)  │◄──►│ DOCA DMA over PCIe          │                             │
│  └─────────────────────┘    └─────────────────────────────┘                             │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                            │
                                            ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│  BlueField DPU                                                                          │
│  ┌─────────────────────────────────────────────────────────────────────────────────┐    │
│  │ blue-cache agent                                                                │    │
│  │  ┌───────────────┐    ┌───────────────┐    ┌─────────────────────────────────┐  │    │
│  │  │ DOCA DMA      │───►│ staging buffer│───►│ NIXL storage backend            │  │    │
│  │  │ engine        │    │ (DPU DRAM)    │    │ (posix / xdfs / xdfs_kv / ...)  │  │    │
│  │  └───────────────┘    └───────────────┘    └─────────────────────────────────┘  │    │
│  │                                              │                                   │    │
│  │                                              ▼                                   │    │
│  │                              ┌──────────────┴──────────────┐                   │    │
│  │                              │                             │                   │    │
│  │                              ▼                             ▼                   │    │
│  │                       ┌─────────────┐            ┌─────────────────┐            │    │
│  │                       │ DPU-local   │            │ Remote Storage  │            │    │
│  │                       │ (posix)     │            │ xdfs / xdfs_kv  │            │    │
│  │                       └─────────────┘            └─────────────────┘            │    │
│  └─────────────────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────────────────┘

blue-cache

The DPU agent is the piece that executes the offload. It runs as a service on the BlueField DPU and is intentionally separate from the NIXL library so it can evolve independently.

Responsibilities:

  • Import the host GPU mmap from the PCI export descriptor sent by the plugin.
  • Maintain a reusable DPU-side staging buffer.
  • Execute chunked, pipelined DOCA DMA with configurable queue depth.
  • Forward received data to a NIXL storage backend running on the DPU, which in turn writes to local files or object storage.

Build and run instructions are in blue-cache/README.md.

NIXL Plugin

The host plugin implements the NIXL nixlBackendEngine interface. It exposes two memory types:

  • VRAM_SEG — Host GPU memory, exported via doca_mmap_export_pci().
  • OBJ_SEG — DPU-resident object/file, identified by a path or key string.

The backend is local-only (supportsRemote() == false): both the GPU and the DPU must be reachable through the same host-side BlueField PCI function.

Because NIXL loads backends dynamically, the plugin source is injected into a NIXL source tree with scripts/patch_nixl.sh and built together with NIXL.

LMCache Integration

examples/lmcache/ contains:

  • lmcache_integration.patch — modifications to LMCache v0.4.3 to recognize and use the BLUE_CACHE backend.
  • lmcache-config.yaml — sample configuration.
  • patch_lmcache.sh — helper that applies the patch idempotently.

After patching LMCache, you can configure a storage backend that points to the DPU agent and offload KV tensors transparently.

Repository Layout

.
├── common/              # Shared host-DPU control channel + wire protocol (dma_transfer.h)
├── nixl-plugin/         # NIXL backend plugin source (patch into NIXL)
├── blue-cache/           # BlueField DPU proxy service
├── examples/
│   ├── cpp/             # NIXL C++ example
│   ├── python/          # NIXL Python example
│   ├── standalone/      # Standalone host test tool (no NIXL required)
│   └── lmcache/         # LMCache v0.4.3 integration patch
├── scripts/             # patch_nixl.sh and build helpers
├── docs/                # Architecture and integration docs
├── CMakeLists.txt
├── LICENSE
└── CONTRIBUTING.md

Quick Start

1. Build blue-cache

On the BlueField DPU:

mkdir -p build && cd build
cmake .. -DBUILD_EXAMPLES=OFF
make -j$(nproc) blue-cache

Run the agent (TCP fallback mode for the easiest first test):

./blue-cache/blue-cache -p 0000:03:00.0 -m 256 -q 4 -b posix -T

Omit -T to use DOCA Comch mode.

2. Patch NIXL with the Plugin

On the host where NIXL is built:

./scripts/patch_nixl.sh /path/to/nixl/source

cd /path/to/nixl/source
meson setup build -Denable_plugins=BLUE_CACHE
ninja -C build

The patch script is idempotent; running it multiple times is safe.

3. Run the Python Example

export NIXL_PLUGIN_DIR=/opt/nvidia/nvda_nixl/lib/plugins

python3 examples/python/nixl_blue_cache_example.py \
    -o push \
    -p 0000:ba:00.0 \
    -g 0 \
    -f /data/test_obj \
    -s 64 \
    -d 10.75.70.125 \
    -m tcp

See examples/python/README.md for push/pull examples and COMCH-mode usage.

Compatibility

This project has been verified against NIXL v1.1.0. Other NIXL versions may require minor adjustments to scripts/patch_nixl.sh or the plugin source.

Documentation

Troubleshooting

NIXL build fails with fatal error: toml++/toml.hpp: No such file or directory

NIXL 1.1.0 uses tomlplusplus as a required dependency. When the telemetry plugin is enabled, its doca backend may miss the tomlplusplus include path because nixl_common_dep is not listed in its dependencies.

Recommended fix: patch src/plugins/telemetry/doca/meson.build to add nixl_common_dep:

# In src/plugins/telemetry/doca/meson.build
- dependencies: [nixl_infra, absl_log_dep, doca_dep],
+ dependencies: [nixl_infra, nixl_common_dep, absl_log_dep, doca_dep],

Then rebuild:

cd /path/to/nixl/source
meson setup build --wipe -Denable_plugins=BLUE_CACHE
ninja -C build

This fix mirrors the upstream NIXL commit b98dd59. It keeps telemetry enabled while correctly propagating the required include path.

Fallback: If you do not need telemetry, disable the telemetry plugins entirely:

cd /path/to/nixl/source
sed -i "s/^subdir('telemetry')/# subdir('telemetry')/" src/plugins/meson.build

meson setup build --wipe -Denable_plugins=BLUE_CACHE
ninja -C build

Could not find nvcc, please set CUDAToolkit_ROOT

The C++ examples require CUDA Toolkit. On a machine without CUDA, disable examples:

cmake .. -DBUILD_EXAMPLES=OFF
make blue-cache

Or build blue-cache directly from the blue-cache/ directory:

cd blue-cache
./scripts/build_dpu.sh

BLUE_CACHE plugin not found at runtime

Set the plugin search path:

export LD_LIBRARY_PATH=/opt/nvidia/nvda_nixl/lib/plugins:$LD_LIBRARY_PATH

Or in Python/C++ code:

agent.add_plugin_directory("/opt/nvidia/nvda_nixl/lib/plugins")

If NIXL was built with -Dstatic_plugins=BLUE_CACHE, the plugin is linked into libnixl.so and no search path is needed.

doca_dma.h not found

DOCA SDK is not installed or DOCA_DIR is incorrect:

cmake .. -DDOCA_DIR=/opt/mellanox/doca

Verify that /opt/mellanox/doca/include/doca_dma.h exists.

License

Apache-2.0. See LICENSE.

About

KVCache Management Via BlueField

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors