Skip to content

feat(temporal): opt-in continue-as-new for long-lived agent workflows#447

Open
danielmillerp wants to merge 1 commit into
nextfrom
dm/temporal-continue-as-new
Open

feat(temporal): opt-in continue-as-new for long-lived agent workflows#447
danielmillerp wants to merge 1 commit into
nextfrom
dm/temporal-continue-as-new

Conversation

@danielmillerp

@danielmillerp danielmillerp commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Why

Long-lived chat/session agents run as a single Temporal workflow that stays open indefinitely. Two things killed them:

  1. Event history grows until it hits Temporal's ~50k-event / 50MB limit, then the workflow stalls.
  2. The workflow execution timeout (24h SDK default) terminated the whole chain — a user returning to an old chat hit a dead workflow and their message silently vanished.

This adds an opt-in continue-as-new pattern to BaseWorkflow so a session can stay open forever, and defaults the execution timeout to infinite so a long-lived chat isn't capped at 24h.

Design

Opt-in by adoption, no flag, no patch gate. An agent gets recycling only by calling run_until_complete from its @workflow.run instead of a bare wait_condition(timeout=None). Agents that keep the old wait are untouched. (There's no workflow.patched() gate: we have no in-flight long-running workflows to preserve, and per Temporal guidance the right tool for evolving these "pure entity" workflows later is Worker Versioning + upgrade-on-continue-as-new, not accumulating patches.)

BaseWorkflow helpers (src/agentex/lib/core/temporal/workflows/workflow.py):

  • run_until_complete(*args, is_complete, can_recycle=True, timeout=None) — keeps the workflow open; recycles history when Temporal suggests it. timeout mirrors the old wait_condition(timeout=...) (durable idle bound); can_recycle lets an agent opt out when a recycle prerequisite is missing.
  • should_continue_as_new() — recycle when workflow.info().is_continue_as_new_suggested() (Temporal owns the threshold).
  • drain_and_continue_as_new() — waits all_handlers_finished (so an in-flight turn isn't lost) and re-checks completion before workflow.continue_as_new.
  • is_continued_run() — the hook agents use to gate state rehydration after a recycle.

Execution timeout (environment_variables.py + temporal_task_service.py + temporal_client.py): WORKFLOW_EXECUTION_TIMEOUT_SECONDS now defaults to None = no execution timeout (None/0/negative → execution_timeout=None). The execution timeout is chain-wide (continue-as-new does NOT reset it), so capping it would still kill a forever-chat. To bound idle workflows, use a durable timer (run_until_complete's timeout), not this ceiling.

Scope (deliberately concise)

This PR is just the pattern + the timeout default. Restoring state after a recycle is framework-specific (rebuild from adk.messages, an adk.state snapshot, or a framework's own memory like a LangGraph checkpointer / Pydantic AI history) and is intentionally left to follow-up PRs, one per integration. The only example touched is 000_hello_acp, which keeps no cross-turn state and so proves the pattern with zero rehydration.

Verification

  • Unit tests for the decision helpers (should_continue_as_new, is_continued_run) — tests/lib/core/temporal/test_base_workflow_continue_as_new.py.
  • py_compile + ruff + pyright clean.
  • Follow-up: replay/integration test of drain_and_continue_as_new against a Temporal test server.

🤖 Generated with Claude Code

Greptile Summary

This PR adds opt-in Temporal continue-as-new support for long-lived agent workflows. The main changes are:

  • New BaseWorkflow helpers for recycle decisions, handler draining, continued-run detection, and wait-loop adoption.
  • Default workflow execution timeout changed to unlimited, with positive env values still supported.
  • The 000_hello_acp Temporal example now opts into the new wait helper.
  • Unit tests cover the pure continue-as-new decision helpers.

Confidence Score: 4/5

The workflow infrastructure change is localized and tested, but the example opt-in path has a visible duplicate initialization-message regression when recycling.

The main helpers and timeout behavior are covered by focused tests, and the remaining issue is confined to the example workflow’s continued-run initialization path rather than the core helper implementation.

examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py

T-Rex T-Rex Logs

What T-Rex did

  • Reproduced the skip-continued prologue scenario by running a focused Python harness that imports the real hello ACP workflow class and mocks Temporal and ADK boundaries, then executed an initial run to call run_until_complete and a continued run with continued_run_id, observing two duplicate acknowledgement messages.
  • Validated the execution_timeout handling by testing various before/after validation cases, confirming that both before and after validation runs exit with code 0 after dependencies are installed and showing the captured command, cwd, commit, scenario output, and exit code in the artifacts.
  • Performed workflow recycling validation by comparing pre- and post-run logs and confirming a successful run (RESULT PASS) with timeout forwarded, no continue_as_new in some cases, and proper drain behavior, aided by the provided recycle script artifacts.

View all artifacts

T-Rex Ran code and verified through T-Rex

Comments Outside Diff (2)

  1. examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py, line 56-63 (link)

    P1 Gate create acknowledgement

    This workflow now opts into continue-as-new, so @workflow.run executes from the top on every recycled run. Because the creation acknowledgement is emitted unconditionally before run_until_complete, each recycle writes another “You should only see this message once” message to the task history and UI. Gate this one-time prologue with is_continued_run() so continued runs do not repeat it.

    Artifacts

    Repro: focused workflow harness that executes the real on_task_create twice across original and continued runs

    • Contains supporting evidence from the run (text/x-python; charset=utf-8).

    Stack trace captured during the T-Rex run

    • Keeps the raw stack trace available without making the summary code-heavy.

    View artifacts

    T-Rex Ran code and verified through T-Rex

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py
    Line: 56-63
    
    Comment:
    **Gate create acknowledgement**
    
    This workflow now opts into continue-as-new, so `@workflow.run` executes from the top on every recycled run. Because the creation acknowledgement is emitted unconditionally before `run_until_complete`, each recycle writes another “You should only see this message once” message to the task history and UI. Gate this one-time prologue with `is_continued_run()` so continued runs do not repeat it.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Cursor Fix in Claude Code Fix in Codex

  2. examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py, line 57-63 (link)

    P1 Skip continued prologue

    run_until_complete(params, ...) forwards the same params into every continued run, so @workflow.run starts over here after each recycle. Because this acknowledgement is emitted before the wait helper and is not gated by self.is_continued_run(), every continue-as-new appends another “You should only see this message once” message to the task history. This creates visible duplicate initialization messages for any long-lived chat that recycles.

    Artifacts

    Repro: generated workflow continue-as-new harness

    • Contains supporting evidence from the run (text/x-python; charset=utf-8).

    Repro: harness output showing duplicate acknowledgement messages

    • Keeps the command output available without making the summary code-heavy.

    View artifacts

    T-Rex Ran code and verified through T-Rex

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py
    Line: 57-63
    
    Comment:
    **Skip continued prologue**
    
    `run_until_complete(params, ...)` forwards the same `params` into every continued run, so `@workflow.run` starts over here after each recycle. Because this acknowledgement is emitted before the wait helper and is not gated by `self.is_continued_run()`, every continue-as-new appends another “You should only see this message once” message to the task history. This creates visible duplicate initialization messages for any long-lived chat that recycles.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Cursor Fix in Claude Code Fix in Codex

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py:57-63
**Skip continued prologue**

`run_until_complete(params, ...)` forwards the same `params` into every continued run, so `@workflow.run` starts over here after each recycle. Because this acknowledgement is emitted before the wait helper and is not gated by `self.is_continued_run()`, every continue-as-new appends another “You should only see this message once” message to the task history. This creates visible duplicate initialization messages for any long-lived chat that recycles.

Reviews (12): Last reviewed commit: "feat(temporal): opt-in continue-as-new f..." | Re-trigger Greptile

@danielmillerp danielmillerp changed the base branch from main to next June 24, 2026 20:20
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 5d63a08 to 4170651 Compare June 24, 2026 20:22
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 4170651 to 891ef6d Compare June 24, 2026 21:07
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
Comment thread examples/tutorials/10_async/10_temporal/150_codex/project/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 1fb74c4 to 22e7358 Compare June 26, 2026 17:55
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 22e7358 to ad68bd8 Compare June 26, 2026 18:11
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from ad68bd8 to 65ab89a Compare June 26, 2026 18:27
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch 2 times, most recently from 43f62c2 to 9d71bb7 Compare June 26, 2026 18:59
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py Outdated
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch 2 times, most recently from 467b202 to 80ab955 Compare June 29, 2026 22:35
Comment thread src/agentex/lib/core/temporal/workflows/workflow.py
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch 2 times, most recently from 1ea35bd to 1a6403d Compare June 30, 2026 08:40
Long-lived chat/session agents run as a single Temporal workflow that stays open
indefinitely. Two things killed them: event history grows past Temporal's
~50k-event / 50MB limit, and the 24h execution-timeout default terminated the
whole chain. This adds an opt-in continue-as-new pattern on BaseWorkflow and
defaults the execution timeout to infinite.

Opt-in by adoption: an agent gets recycling only by calling run_until_complete
from its @workflow.run instead of a bare wait_condition(timeout=None). No flag,
no patch gate (no in-flight long-running workflows to preserve; Worker Versioning
+ upgrade-on-continue-as-new is the path for evolving these later).

BaseWorkflow helpers:
- run_until_complete(*args, is_complete, can_recycle=True, timeout=None): keep the
  workflow open; recycle history when Temporal suggests it.
- should_continue_as_new(): recycle when workflow.info().is_continue_as_new_suggested().
- drain_and_continue_as_new(): drain all_handlers_finished and re-check completion
  before workflow.continue_as_new.

Execution timeout: WORKFLOW_EXECUTION_TIMEOUT_SECONDS now defaults to None (no
execution timeout; None/0/negative -> execution_timeout=None). It is chain-wide
(continue-as-new does not reset it), so leaving it unset is required for a
forever-chat. Idle bounding is a durable timer (run_until_complete's timeout).

State restoration after a recycle is framework-specific and left to follow-up
PRs, one per integration. 000_hello_acp adopts the pattern (no cross-turn state,
so no rehydration needed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@danielmillerp danielmillerp force-pushed the dm/temporal-continue-as-new branch from 1a6403d to 1069028 Compare June 30, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant