Skip to content

feat(vfs): add lazy vfs + remove dynamic fields for prompt caching hits#5138

Open
Sg312 wants to merge 5 commits into
stagingfrom
dev
Open

feat(vfs): add lazy vfs + remove dynamic fields for prompt caching hits#5138
Sg312 wants to merge 5 commits into
stagingfrom
dev

Conversation

@Sg312

@Sg312 Sg312 commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds lazy vfs and vfs deltas for prompt caching
Companion: https://github.com/simstudioai/mothership/pull/330

Type of Change

  • New feature

Testing

Manual

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 19, 2026 10:03pm

Request Review

@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces two related improvements to the copilot VFS layer: lazy materialization of expensive per-workflow/KB artifacts (state.json, lint.json, executions.json, deployment.json, documents.json, connectors.json) so that read and glob operations no longer pay for O(workflows) graph-loads on every tool call, and deterministic serialization of WORKSPACE.md (stable sorting + omitting volatile fields like lastRunAt and rowCount) so the prompt-cache prefix stays byte-identical across turns unless the workspace structure actually changes.

  • Lazy VFS: Heavy artifacts are now registered as closures and resolved on-demand — read resolves one path, scoped grep resolves only paths within the scope, and glob resolves none (keys-only view). Per-instance memos (normalizedCache, deploymentCache) ensure that state.json and lint.json for the same workflow share a single DB load within one tool call.
  • Deterministic workspace context: All collections in buildWorkspaceMd are sorted with a pinned 'en' locale comparator; volatile fields (lastRunAt, rowCount) are stripped from the rendered output, with tests asserting byte-identity regardless of input order.

Confidence Score: 4/5

Safe to merge. The lazy VFS is well-scoped (fresh instance per tool call eliminates cross-request sharing concerns), the deterministic WORKSPACE.md serialization is correct, and the async method signatures are properly awaited in all callers.

The two notable gaps are the use of denormalized signals (lastRunAt, docCount) as gates for lazy artifact registration — stale values silently omit execution/document data — and the loss of entity-specific IDs in the centralized error log. Neither causes incorrect data to be shown; they could cause data to be absent or harder to debug. The core lazy-loading logic, memoization, and determinism changes are all correct.

apps/sim/lib/copilot/vfs/workspace-vfs.ts — the denormalized-gate and error-logging points are both here.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/vfs/workspace-vfs.ts Core lazy-VFS implementation — adds lazy map, normalizedCache/deploymentCache memos, resolveLazyPath/resolveLazyWithinScope/keyView helpers, makes grep and read async. Logic is sound since getOrMaterializeVFS creates a fresh instance per tool call. Minor: executions.json/documents.json/connectors.json availability is now gated on denormalized signals (lastRunAt, docCount, connectorTypes) that could be stale in edge cases.
apps/sim/lib/copilot/chat/workspace-context.ts Adds stable sorting (stableCompare/byNameThenId) to all collections in buildWorkspaceMd, strips lastRunAt and rowCount from rendered output, and drops the N per-table COUNT(*) queries from generateWorkspaceContext. Changes are correct and well-commented.
apps/sim/lib/copilot/chat/workspace-context.test.ts Adds three determinism tests: byte-identity regardless of input row order, no last run timestamp in workflow output, and no row count in table output. Tests are well-constructed and correctly cover the new sorting and omission behavior.
apps/sim/lib/copilot/tools/handlers/vfs.ts Adds await to the now-async vfs.grep() and vfs.read() calls. Minimal and correct change.
apps/sim/lib/copilot/vfs/operations.ts Exports pathWithinGrepScope so resolveLazyWithinScope can use the same scope filter as ops.grep, keeping materialized and scanned sets identical.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant H as VFS Handler
    participant V as WorkspaceVFS
    participant L as lazy Map
    participant F as files Map

    Note over V: materialize() — always fresh per tool call
    V->>F: set eager artifacts (metadata, WORKSPACE.md, folder markers)
    V->>L: registerLazy(state.json, lint.json, executions.json, ...)

    H->>V: read("workflows/foo/state.json")
    V->>V: resolveLazyPath(path)
    V->>L: get(path) → loader
    V->>L: delete(path)
    V->>F: await loader() → set(path, content)
    V-->>H: ReadResult

    H->>V: "glob("workflows/**")"
    V->>V: keyView(false) — eager + lazy keys, no content load
    V-->>H: string[] (paths only)

    H->>V: grep("pattern", "workflows/foo")
    V->>V: resolveLazyWithinScope("workflows/foo")
    loop for each lazy path in scope
        V->>L: delete(path)
        V->>F: await loader() → set(path, content)
    end
    V->>F: activeFiles() — all resolved files
    V-->>H: GrepMatch[]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant H as VFS Handler
    participant V as WorkspaceVFS
    participant L as lazy Map
    participant F as files Map

    Note over V: materialize() — always fresh per tool call
    V->>F: set eager artifacts (metadata, WORKSPACE.md, folder markers)
    V->>L: registerLazy(state.json, lint.json, executions.json, ...)

    H->>V: read("workflows/foo/state.json")
    V->>V: resolveLazyPath(path)
    V->>L: get(path) → loader
    V->>L: delete(path)
    V->>F: await loader() → set(path, content)
    V-->>H: ReadResult

    H->>V: "glob("workflows/**")"
    V->>V: keyView(false) — eager + lazy keys, no content load
    V-->>H: string[] (paths only)

    H->>V: grep("pattern", "workflows/foo")
    V->>V: resolveLazyWithinScope("workflows/foo")
    loop for each lazy path in scope
        V->>L: delete(path)
        V->>F: await loader() → set(path, content)
    end
    V->>F: activeFiles() — all resolved files
    V-->>H: GrepMatch[]
Loading

Comments Outside Diff (1)

  1. apps/sim/lib/copilot/vfs/workspace-vfs.ts, line 820-822 (link)

    P2 Lazy gate depends on denormalized signal

    executions.json is only registered when wf.lastRunAt is non-null, and documents.json for a KB is only registered when kb.docCount > 0. If either signal is stale or null due to a migration, a manual DB update, or a brief inconsistency, the artifact won't be advertised even though rows exist in the underlying tables. The consequence is a silent omission: the agent won't see execution history or KB documents for that resource until the denormalized field is corrected. The old code queried unconditionally, so it was immune to this class of stale-count bug.

Reviews (1): Last reviewed commit: "feat(vfs): add lazy vfs + remove dynamic..." | Re-trigger Greptile

Comment thread apps/sim/lib/copilot/vfs/workspace-vfs.ts
Build the workspace inventory from the primary db (fixes replica-lag staleness)
and emit it as a typed VfsSnapshotV1 `vfs` payload alongside the markdown, so the
mothership can diff it into append-only baseline/delta messages. Generate the TS
contract mirror from the Go-owned JSON schema (sync-vfs-snapshot-contract) and
sort connector types so diffs stay byte-stable.
The branch buildPayload implementations hand-list the params they pass to
buildCopilotRequestPayload and forwarded workspaceContext but dropped vfs, so the
typed snapshot never reached the Go request (req.Vfs was always nil and the
append-only delta path never engaged). Forward vfs in both the workflow and
workspace branches, and add a regression guard asserting the branch threads it
through (the bug slipped past tests because post.test mocked the payload builder
and payload.test called it directly, bypassing the branch).
@github-actions github-actions Bot added the requires-mothership-merge Has a companion PR on the mothership/copilot side — merge in lockstep label Jun 19, 2026
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

⚠️ Cross-repo companion check

One or more companion PRs aren't merged into staging yet. Merging this without them will leave copilot and sim out of sync — merge them in lockstep.

  • simstudioai/mothership#330OPEN, not merged (targets staging) — feat(vfs): add lazy vfs + remove dynamic fields for prompt caching hits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

requires-mothership-merge Has a companion PR on the mothership/copilot side — merge in lockstep

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant