SWE Atlas Multiturn is a benchmark of 75 tasks for evaluating coding agents on multi-turn software engineering tasks in a realistic user-driven development setting. The repository contains the task data and example Harbor run configs needed to run the tasks.
data/
multiturn/
run_configs/
multiturn/ # example configs for data/multiturn
Install Harbor:
git clone https://github.com/laude-institute/harbor.git
cd harbor
uv tool install .Set up Modal for sandbox environments:
uv pip install modal
modal setupRun configs load credentials from harbor/.env relative to this repository root. Create that file before launching a run:
mkdir -p harbor
$EDITOR harbor/.envPut this block in harbor/.env. It covers the shared simulated-user setup (GPT 5.5 high) and RF rubric grading (the RF task default uses Anthropic Opus 4.5, matching the original SWE Atlas Refactoring task):
OPENAI_API_KEY=<your-gateway-api-key>
OPENAI_API_BASE=<openai-compatible-gateway-url>/v1OPENAI_API_BASE must support both openai/gpt-5.5 and the RF rubric default model, anthropic/claude-opus-4-5-20251101. A LiteLLM gateway works for this; direct https://api.openai.com/v1 does not support the Anthropic rubric model unless you override EVAL_MODEL to an OpenAI model.
Add only the variables needed for the agent config you run:
| Config | Additional harbor/.env setting |
|---|---|
gpt-5p5-high_codex.sh |
None beyond the common block |
opus-4p8-high_claude-code.sh |
ANTHROPIC_API_KEY=<your-anthropic-api-key> |
sonnet-4p6-high_claude-code.sh |
ANTHROPIC_API_KEY=<your-anthropic-api-key> |
gemini-3p5-flash-high_opencode.sh |
GEMINI_API_KEY=<your-gemini-api-key> |
kimi-k2p6_kimi-cli.sh |
OPENAI_API_KEY=<key-for-openai-compatible-endpoint> and OPENAI_API_BASE=<endpoint-url> or OPENAI_BASE_URL=<endpoint-url> |
Run commands from the repository root.
Multi-turn example:
bash run_configs/multiturn/gpt-5p5-high_codex.shMulti-turn run configs set the simulated user model to openai/gpt-5.5 via SIM_USER_MODEL.
To run the baseline single-turn example:
bash run_configs/singleturn/gpt-5p5-high_codex.shThe scripts write outputs under results/. To make a custom config, copy an existing script and update the agent, model, sampling count, or Harbor arguments.
This repository is released under the Apache License 2.0. See LICENSE.