perf(native): fix full-build regression in the P6 dataflow vertex pass by carlos-alm · Pull Request #1700 · optave/ops-codegraph-tool

carlos-alm · 2026-06-23T17:29:32Z

Problem

The pre-publish benchmark gate failed on the native full build regressing 3012 → 4504 ms (+50%) vs the 3.13.0 baseline (threshold +25%). The regression is native-only — WASM full build improved in the same run (14214 → 13107 ms) — and the corpus is smaller (629 vs 695 files), so it's a genuine per-file slowdown, not corpus growth or runner noise.

Root cause

Profiling a freshly-built main native addon on codegraph's own source showed the cost is dominated by the P6 dataflow vertex pass (runDataflowVertexPass → buildDataflowVerticesFromMap), which runs untimed so it never surfaced in result.phases. Two inefficiencies, both pre-existing but amplified as the dynamic-call resolution phases (0–6) added more tracked dataflow:

Redundant nodes-table queries — makeNodeResolver ran 1–2 uncached queries per call and was invoked repeatedly for the same function names within each file (per param, return, assignment, argFlow, summary, capture). On a full build that's tens of thousands of redundant queries.
Serial re-parse — the Rust orchestrator writes dataflow edges but not vertices, so P6 re-derives a DataflowResult for every dataflow file by calling extractDataflowAnalysis once per file — hundreds of synchronous parses on the event loop.

Fix

Memoise the node resolver per (relPath, funcName). The nodes table is never mutated during P6 (only dataflow* tables are written), so the lookup is stable for the resolver's lifetime.
extract_dataflow_analysis_batch — a new addon export that reads + parses every requested file across the rayon thread pool in one NAPI call (each parse_source builds its own tree-sitter Parser, so it's embarrassingly parallel). runDataflowVertexPass feature-detects it and falls back to the per-file path on older addons.

Vertex/edge building stays entirely in TS (shared with the WASM engine) — the batch only changes how the DataflowResults are obtained, not how vertices are built from them, so there is no second implementation to keep in parity.

Verification

Result-preserving / parity-safe: a same-process batch-vs-per-file A/B on codegraph's corpus produces byte-identical dataflow output — dataflow_vertices (12200), dataflow_summary (5839), and every dataflow edge kind (arg_in/def_use/return_out/…) match, as does the inter-procedural edge semantic fingerprint.
All 136 dataflow tests pass.
Perf (idle-machine 4-way interleaved A/B, median of 6 full builds):

arm median saved

no fix 3215 ms —

memoise only 2751 ms 464 ms

batch only 2879 ms 336 ms

both 2487 ms 728 ms (−22.6%)

Applying −22.6% to the CI figure (4504 ms) → ~3486 ms, under the 3765 ms (+25%) threshold. The query-elimination should amplify on CI's slower I/O. The pre-publish gate is the final verifier.

The native full-build's P6 dataflow vertex pass (runDataflowVertexPass → buildDataflowVerticesFromMap) dominates wall-clock on a full build but ran untimed, so it never surfaced in result.phases. Profiling the v3.15.0 native addon on codegraph's own source (631 files, matching the publish-gate corpus) showed the per-file vertex loop alone at ~840ms. Root cause: makeNodeResolver runs one or two `nodes`-table queries per call and is invoked repeatedly for the SAME function names within each file — once per param, return, assignment, argFlow, summary row, and capture. With ~6700 argFlow candidates plus params/returns across 416 dataflow files, that is tens of thousands of redundant queries. The inefficiency was tolerable at the 3.13.0 data volume but was amplified as the dynamic-call resolution phases (0–6) added more tracked dataflow, contributing to the native full-build regression caught by the pre-publish benchmark gate (3012 → 4504 ms). Fix: memoise the resolver per (relPath, funcName). The nodes table is never mutated during the P6 vertex pass (only dataflow* tables are written), so the lookup is stable for the resolver's lifetime and the cache is result-preserving. Local A/B on an idle machine: full build 3430 → 3044 ms; vertex loop ~840 → ~390 ms. All 136 dataflow tests pass; vertex/edge counts unchanged. Impact: 1 functions changed, 7 affected

The native orchestrator's P6 vertex pass re-derives a DataflowResult for every dataflow-bearing file on a full build (the Rust orchestrator writes dataflow edges but not vertices). It did this by calling extractDataflowAnalysis once per file — hundreds of serial parses on the JS event loop, the single largest component of the native full-build benchmark regression after the resolver memoisation in the previous commit. Add extract_dataflow_analysis_batch to the addon: it reads and parses every requested file across the rayon thread pool in one NAPI call (each parse_source builds its own tree-sitter Parser, so the work is embarrassingly parallel) and returns results positionally. runDataflowVertexPass feature-detects the export and calls it once, mapping results back onto its file list exactly as before; older published addons that predate the export fall back to the per-file path. Vertex building stays entirely in TS (shared with the WASM engine), so there is no second implementation to keep in parity — the batch only changes how the DataflowResults are obtained, not how vertices/edges are built from them. Verified result-preserving: a same-process A/B (batch vs per-file) produces byte-identical dataflow output on codegraph's own corpus — dataflow_vertices, dataflow_summary, and every dataflow edge kind (arg_in/def_use/return_out/…) match, as does the inter-procedural edge semantic fingerprint. All 136 dataflow tests pass. Local 4-way A/B (idle machine, median of 6): full build 3215 → 2487 ms with memoisation + batch combined (−22.6%); batch alone −336 ms. Impact: 2 functions changed, 4 affected

greptile-apps · 2026-06-23T17:33:16Z

Greptile Summary

This PR fixes a +50% native full-build regression in the P6 dataflow vertex pass by eliminating two dominant costs: redundant nodes-table queries per resolved function name, and serial per-file dataflow parsing on the JS event loop.

Memoize makeNodeResolver: a Map cache keyed by funcName is closed over per resolver instance; the undefined/null distinction correctly distinguishes cache miss from "not found", collapsing repeated queries for the same name into one per distinct function per file.
extract_dataflow_analysis_batch (Rust): a new #[napi] export that reads every requested file from disk and runs extract_dataflow_standalone across the rayon thread pool in one synchronous call, returning results positionally (None on I/O failure or unsupported language). The TS side feature-detects the export and falls back to the existing per-file path on older addons, keeping both paths result-identical.
Fallback alignment: the per-file path now wraps readFileSafe in a try/catch and explicitly routes empty files to wasmStubs, matching the batch path's null-means-WASM contract.

Confidence Score: 5/5

Safe to merge — both optimisations are narrowly scoped, the batch path is feature-detected with a clean fallback, and corpus-level A/B testing confirmed byte-identical output.

The changes are localised: memoisation only affects query count, not query results; the Rust batch function is a pure map over the existing single-file extractor; and the fallback path now correctly mirrors batch semantics for edge cases. No mutations to existing data contracts, and the type system enforces the optional-export contract.

No files require special attention. native-orchestrator.ts has the most logic surface area but both branches are clearly separated and the alignment fix for the fallback path removes the prior ambiguity.

Important Files Changed

Filename	Overview
crates/codegraph-core/src/lib.rs	Adds `extract_dataflow_analysis_batch`: reads files from disk and runs `extract_dataflow_standalone` in parallel via rayon. Correct positional output (same length as input), `read_to_string` returns None on I/O failure matching the None-for-null contract. `lang_id: None` matches how the TS per-file path calls the existing API.
src/domain/graph/builder/stages/native-orchestrator.ts	Refactors `runDataflowVertexPass` to attempt a single batch NAPI call, falling back to per-file extraction when the export is absent or throws. Both code paths now consistently route unreadable/empty files to `wasmStubs`, resolving the previous divergence comment.
src/features/dataflow.ts	Adds a `Map` cache inside `makeNodeResolver`. The null/undefined distinction is correctly handled: `cache.get()` returns `undefined` on miss and `null` on "not found", so the `!== undefined` guard avoids re-querying without incorrectly short-circuiting on cached nulls.
src/types.ts	Adds the optional `extractDataflowAnalysisBatch?` method to `NativeAddon`, correctly typed as optional to support older addon versions without the export.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[runDataflowVertexPass] --> B{extractDataflowAnalysisBatch\navailable?}
    B -- yes --> C[Call batch NAPI\none round-trip]
    C --> D{batch threw?}
    D -- yes --> E[batchResults = null\nfall through]
    D -- no --> F[batchResults array]
    B -- no --> E

    E --> G[Per-file loop\nreadFileSafe + extractDataflowAnalysis]
    F --> H[Per-file loop\nindex into batchResults]

    G --> I{result?}
    H --> I

    I -- null/None --> J[wasmStubs.set]
    I -- DataflowResult --> K[patchDataflowResult\nnativeDataflow.set]

    K --> L[buildDataflowVerticesFromMap\nwith memoised makeNodeResolver]
    J --> M[WASM handles\nedges + vertices]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[runDataflowVertexPass] --> B{extractDataflowAnalysisBatch\navailable?}
    B -- yes --> C[Call batch NAPI\none round-trip]
    C --> D{batch threw?}
    D -- yes --> E[batchResults = null\nfall through]
    D -- no --> F[batchResults array]
    B -- no --> E

    E --> G[Per-file loop\nreadFileSafe + extractDataflowAnalysis]
    F --> H[Per-file loop\nindex into batchResults]

    G --> I{result?}
    H --> I

    I -- null/None --> J[wasmStubs.set]
    I -- DataflowResult --> K[patchDataflowResult\nnativeDataflow.set]

    K --> L[buildDataflowVerticesFromMap\nwith memoised makeNodeResolver]
    J --> M[WASM handles\nedges + vertices]

_{Reviews (3): Last reviewed commit: "fix(native): align fallback-path unreada..." | Re-trigger Greptile}

github-actions · 2026-06-23T17:37:29Z

Codegraph Impact Analysis

3 functions changed → 11 callers affected across 6 files

runDataflowVertexPass in src/domain/graph/builder/stages/native-orchestrator.ts:313 (4 transitive callers)
makeNodeResolver in src/features/dataflow.ts:740 (7 transitive callers)
NativeAddon.extractDataflowAnalysisBatch in src/types.ts:2296 (0 transitive callers)

…batch path In the per-file fallback branch of runDataflowVertexPass, an unreadable file would throw uncaught from readFileSafe and an empty file would be silently skipped — neither was added to wasmStubs. The batch path correctly routes both cases to WASM via a null positional result. Wrap readFileSafe in a try/catch and replace the bare `continue` for empty files so both conditions add the file to wasmStubs, matching the batch-path contract. Impact: 1 functions changed, 4 affected

carlos-alm · 2026-06-23T19:39:39Z

Fixed in fbff44a. The fallback branch in runDataflowVertexPass now wraps readFileSafe in a try/catch and routes unreadable files to wasmStubs (same as the batch path). Empty files (!source) are also routed to wasmStubs instead of being silently skipped. Both edge cases now behave identically between the batch and per-file fallback paths.

carlos-alm · 2026-06-23T19:39:50Z

@greptileai

carlos-alm added 2 commits June 22, 2026 23:53

carlos-alm merged commit ad6cd86 into main Jun 23, 2026
29 checks passed

carlos-alm deleted the perf/native-fullbuild-3.15 branch June 23, 2026 21:14

github-actions Bot locked and limited conversation to collaborators Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(native): fix full-build regression in the P6 dataflow vertex pass#1700

perf(native): fix full-build regression in the P6 dataflow vertex pass#1700
carlos-alm merged 3 commits into
mainfrom
perf/native-fullbuild-3.15

carlos-alm commented Jun 23, 2026

Uh oh!

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

carlos-alm commented Jun 23, 2026

Uh oh!

carlos-alm commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arm	median	saved
no fix	3215 ms	—
memoise only	2751 ms	464 ms
batch only	2879 ms	336 ms
both	2487 ms	728 ms (−22.6%)

Uh oh!

Conversation

carlos-alm commented Jun 23, 2026

Problem

Root cause

Fix

Verification

Uh oh!

greptile-apps Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

github-actions Bot commented Jun 23, 2026

Codegraph Impact Analysis

Uh oh!

carlos-alm commented Jun 23, 2026

Uh oh!

carlos-alm commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Jun 23, 2026 •

edited

Loading