feat(temporal): opt-in continue-as-new for long-lived agent workflows#447
Open
danielmillerp wants to merge 1 commit into
Open
feat(temporal): opt-in continue-as-new for long-lived agent workflows#447danielmillerp wants to merge 1 commit into
danielmillerp wants to merge 1 commit into
Conversation
5d63a08 to
4170651
Compare
4170651 to
891ef6d
Compare
891ef6d to
1fb74c4
Compare
1fb74c4 to
22e7358
Compare
22e7358 to
ad68bd8
Compare
ad68bd8 to
65ab89a
Compare
43f62c2 to
9d71bb7
Compare
467b202 to
80ab955
Compare
1ea35bd to
1a6403d
Compare
Long-lived chat/session agents run as a single Temporal workflow that stays open indefinitely. Two things killed them: event history grows past Temporal's ~50k-event / 50MB limit, and the 24h execution-timeout default terminated the whole chain. This adds an opt-in continue-as-new pattern on BaseWorkflow and defaults the execution timeout to infinite. Opt-in by adoption: an agent gets recycling only by calling run_until_complete from its @workflow.run instead of a bare wait_condition(timeout=None). No flag, no patch gate (no in-flight long-running workflows to preserve; Worker Versioning + upgrade-on-continue-as-new is the path for evolving these later). BaseWorkflow helpers: - run_until_complete(*args, is_complete, can_recycle=True, timeout=None): keep the workflow open; recycle history when Temporal suggests it. - should_continue_as_new(): recycle when workflow.info().is_continue_as_new_suggested(). - drain_and_continue_as_new(): drain all_handlers_finished and re-check completion before workflow.continue_as_new. Execution timeout: WORKFLOW_EXECUTION_TIMEOUT_SECONDS now defaults to None (no execution timeout; None/0/negative -> execution_timeout=None). It is chain-wide (continue-as-new does not reset it), so leaving it unset is required for a forever-chat. Idle bounding is a durable timer (run_until_complete's timeout). State restoration after a recycle is framework-specific and left to follow-up PRs, one per integration. 000_hello_acp adopts the pattern (no cross-turn state, so no rehydration needed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1a6403d to
1069028
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Long-lived chat/session agents run as a single Temporal workflow that stays open indefinitely. Two things killed them:
This adds an opt-in continue-as-new pattern to
BaseWorkflowso a session can stay open forever, and defaults the execution timeout to infinite so a long-lived chat isn't capped at 24h.Design
Opt-in by adoption, no flag, no patch gate. An agent gets recycling only by calling
run_until_completefrom its@workflow.runinstead of a barewait_condition(timeout=None). Agents that keep the old wait are untouched. (There's noworkflow.patched()gate: we have no in-flight long-running workflows to preserve, and per Temporal guidance the right tool for evolving these "pure entity" workflows later is Worker Versioning + upgrade-on-continue-as-new, not accumulating patches.)BaseWorkflowhelpers (src/agentex/lib/core/temporal/workflows/workflow.py):run_until_complete(*args, is_complete, can_recycle=True, timeout=None)— keeps the workflow open; recycles history when Temporal suggests it.timeoutmirrors the oldwait_condition(timeout=...)(durable idle bound);can_recyclelets an agent opt out when a recycle prerequisite is missing.should_continue_as_new()— recycle whenworkflow.info().is_continue_as_new_suggested()(Temporal owns the threshold).drain_and_continue_as_new()— waitsall_handlers_finished(so an in-flight turn isn't lost) and re-checks completion beforeworkflow.continue_as_new.is_continued_run()— the hook agents use to gate state rehydration after a recycle.Execution timeout (
environment_variables.py+temporal_task_service.py+temporal_client.py):WORKFLOW_EXECUTION_TIMEOUT_SECONDSnow defaults to None = no execution timeout (None/0/negative →execution_timeout=None). The execution timeout is chain-wide (continue-as-new does NOT reset it), so capping it would still kill a forever-chat. To bound idle workflows, use a durable timer (run_until_complete'stimeout), not this ceiling.Scope (deliberately concise)
This PR is just the pattern + the timeout default. Restoring state after a recycle is framework-specific (rebuild from
adk.messages, anadk.statesnapshot, or a framework's own memory like a LangGraph checkpointer / Pydantic AI history) and is intentionally left to follow-up PRs, one per integration. The only example touched is000_hello_acp, which keeps no cross-turn state and so proves the pattern with zero rehydration.Verification
should_continue_as_new,is_continued_run) —tests/lib/core/temporal/test_base_workflow_continue_as_new.py.py_compile+ruff+pyrightclean.drain_and_continue_as_newagainst a Temporal test server.🤖 Generated with Claude Code
Greptile Summary
This PR adds opt-in Temporal continue-as-new support for long-lived agent workflows. The main changes are:
BaseWorkflowhelpers for recycle decisions, handler draining, continued-run detection, and wait-loop adoption.000_hello_acpTemporal example now opts into the new wait helper.Confidence Score: 4/5
The workflow infrastructure change is localized and tested, but the example opt-in path has a visible duplicate initialization-message regression when recycling.
The main helpers and timeout behavior are covered by focused tests, and the remaining issue is confined to the example workflow’s continued-run initialization path rather than the core helper implementation.
examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py
What T-Rex did
Comments Outside Diff (2)
examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py, line 56-63 (link)This workflow now opts into continue-as-new, so
@workflow.runexecutes from the top on every recycled run. Because the creation acknowledgement is emitted unconditionally beforerun_until_complete, each recycle writes another “You should only see this message once” message to the task history and UI. Gate this one-time prologue withis_continued_run()so continued runs do not repeat it.Artifacts
Repro: focused workflow harness that executes the real on_task_create twice across original and continued runs
Stack trace captured during the T-Rex run
Prompt To Fix With AI
examples/tutorials/10_async/10_temporal/000_hello_acp/project/workflow.py, line 57-63 (link)run_until_complete(params, ...)forwards the sameparamsinto every continued run, so@workflow.runstarts over here after each recycle. Because this acknowledgement is emitted before the wait helper and is not gated byself.is_continued_run(), every continue-as-new appends another “You should only see this message once” message to the task history. This creates visible duplicate initialization messages for any long-lived chat that recycles.Artifacts
Repro: generated workflow continue-as-new harness
Repro: harness output showing duplicate acknowledgement messages
Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (12): Last reviewed commit: "feat(temporal): opt-in continue-as-new f..." | Re-trigger Greptile