Skip to content

perf(native): fix full-build regression in the P6 dataflow vertex pass#1700

Merged
carlos-alm merged 3 commits into
mainfrom
perf/native-fullbuild-3.15
Jun 23, 2026
Merged

perf(native): fix full-build regression in the P6 dataflow vertex pass#1700
carlos-alm merged 3 commits into
mainfrom
perf/native-fullbuild-3.15

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Problem

The pre-publish benchmark gate failed on the native full build regressing 3012 → 4504 ms (+50%) vs the 3.13.0 baseline (threshold +25%). The regression is native-only — WASM full build improved in the same run (14214 → 13107 ms) — and the corpus is smaller (629 vs 695 files), so it's a genuine per-file slowdown, not corpus growth or runner noise.

Root cause

Profiling a freshly-built main native addon on codegraph's own source showed the cost is dominated by the P6 dataflow vertex pass (runDataflowVertexPassbuildDataflowVerticesFromMap), which runs untimed so it never surfaced in result.phases. Two inefficiencies, both pre-existing but amplified as the dynamic-call resolution phases (0–6) added more tracked dataflow:

  1. Redundant nodes-table queriesmakeNodeResolver ran 1–2 uncached queries per call and was invoked repeatedly for the same function names within each file (per param, return, assignment, argFlow, summary, capture). On a full build that's tens of thousands of redundant queries.
  2. Serial re-parse — the Rust orchestrator writes dataflow edges but not vertices, so P6 re-derives a DataflowResult for every dataflow file by calling extractDataflowAnalysis once per file — hundreds of synchronous parses on the event loop.

Fix

  • Memoise the node resolver per (relPath, funcName). The nodes table is never mutated during P6 (only dataflow* tables are written), so the lookup is stable for the resolver's lifetime.
  • extract_dataflow_analysis_batch — a new addon export that reads + parses every requested file across the rayon thread pool in one NAPI call (each parse_source builds its own tree-sitter Parser, so it's embarrassingly parallel). runDataflowVertexPass feature-detects it and falls back to the per-file path on older addons.

Vertex/edge building stays entirely in TS (shared with the WASM engine) — the batch only changes how the DataflowResults are obtained, not how vertices are built from them, so there is no second implementation to keep in parity.

Verification

  • Result-preserving / parity-safe: a same-process batch-vs-per-file A/B on codegraph's corpus produces byte-identical dataflow output — dataflow_vertices (12200), dataflow_summary (5839), and every dataflow edge kind (arg_in/def_use/return_out/…) match, as does the inter-procedural edge semantic fingerprint.

  • All 136 dataflow tests pass.

  • Perf (idle-machine 4-way interleaved A/B, median of 6 full builds):

    arm median saved
    no fix 3215 ms
    memoise only 2751 ms 464 ms
    batch only 2879 ms 336 ms
    both 2487 ms 728 ms (−22.6%)

    Applying −22.6% to the CI figure (4504 ms) → ~3486 ms, under the 3765 ms (+25%) threshold. The query-elimination should amplify on CI's slower I/O. The pre-publish gate is the final verifier.

The native full-build's P6 dataflow vertex pass (runDataflowVertexPass →
buildDataflowVerticesFromMap) dominates wall-clock on a full build but ran
untimed, so it never surfaced in result.phases. Profiling the v3.15.0 native
addon on codegraph's own source (631 files, matching the publish-gate corpus)
showed the per-file vertex loop alone at ~840ms.

Root cause: makeNodeResolver runs one or two `nodes`-table queries per call and
is invoked repeatedly for the SAME function names within each file — once per
param, return, assignment, argFlow, summary row, and capture. With ~6700 argFlow
candidates plus params/returns across 416 dataflow files, that is tens of
thousands of redundant queries. The inefficiency was tolerable at the 3.13.0
data volume but was amplified as the dynamic-call resolution phases (0–6) added
more tracked dataflow, contributing to the native full-build regression caught
by the pre-publish benchmark gate (3012 → 4504 ms).

Fix: memoise the resolver per (relPath, funcName). The nodes table is never
mutated during the P6 vertex pass (only dataflow* tables are written), so the
lookup is stable for the resolver's lifetime and the cache is result-preserving.
Local A/B on an idle machine: full build 3430 → 3044 ms; vertex loop ~840 → ~390
ms. All 136 dataflow tests pass; vertex/edge counts unchanged.

Impact: 1 functions changed, 7 affected
The native orchestrator's P6 vertex pass re-derives a DataflowResult for every
dataflow-bearing file on a full build (the Rust orchestrator writes dataflow
edges but not vertices). It did this by calling extractDataflowAnalysis once per
file — hundreds of serial parses on the JS event loop, the single largest
component of the native full-build benchmark regression after the resolver
memoisation in the previous commit.

Add extract_dataflow_analysis_batch to the addon: it reads and parses every
requested file across the rayon thread pool in one NAPI call (each parse_source
builds its own tree-sitter Parser, so the work is embarrassingly parallel) and
returns results positionally. runDataflowVertexPass feature-detects the export
and calls it once, mapping results back onto its file list exactly as before;
older published addons that predate the export fall back to the per-file path.

Vertex building stays entirely in TS (shared with the WASM engine), so there is
no second implementation to keep in parity — the batch only changes how the
DataflowResults are obtained, not how vertices/edges are built from them.

Verified result-preserving: a same-process A/B (batch vs per-file) produces
byte-identical dataflow output on codegraph's own corpus — dataflow_vertices,
dataflow_summary, and every dataflow edge kind (arg_in/def_use/return_out/…)
match, as does the inter-procedural edge semantic fingerprint. All 136 dataflow
tests pass. Local 4-way A/B (idle machine, median of 6): full build 3215 →
2487 ms with memoisation + batch combined (−22.6%); batch alone −336 ms.

Impact: 2 functions changed, 4 affected
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a +50% native full-build regression in the P6 dataflow vertex pass by eliminating two dominant costs: redundant nodes-table queries per resolved function name, and serial per-file dataflow parsing on the JS event loop.

  • Memoize makeNodeResolver: a Map cache keyed by funcName is closed over per resolver instance; the undefined/null distinction correctly distinguishes cache miss from "not found", collapsing repeated queries for the same name into one per distinct function per file.
  • extract_dataflow_analysis_batch (Rust): a new #[napi] export that reads every requested file from disk and runs extract_dataflow_standalone across the rayon thread pool in one synchronous call, returning results positionally (None on I/O failure or unsupported language). The TS side feature-detects the export and falls back to the existing per-file path on older addons, keeping both paths result-identical.
  • Fallback alignment: the per-file path now wraps readFileSafe in a try/catch and explicitly routes empty files to wasmStubs, matching the batch path's null-means-WASM contract.

Confidence Score: 5/5

Safe to merge — both optimisations are narrowly scoped, the batch path is feature-detected with a clean fallback, and corpus-level A/B testing confirmed byte-identical output.

The changes are localised: memoisation only affects query count, not query results; the Rust batch function is a pure map over the existing single-file extractor; and the fallback path now correctly mirrors batch semantics for edge cases. No mutations to existing data contracts, and the type system enforces the optional-export contract.

No files require special attention. native-orchestrator.ts has the most logic surface area but both branches are clearly separated and the alignment fix for the fallback path removes the prior ambiguity.

Important Files Changed

Filename Overview
crates/codegraph-core/src/lib.rs Adds extract_dataflow_analysis_batch: reads files from disk and runs extract_dataflow_standalone in parallel via rayon. Correct positional output (same length as input), read_to_string returns None on I/O failure matching the None-for-null contract. lang_id: None matches how the TS per-file path calls the existing API.
src/domain/graph/builder/stages/native-orchestrator.ts Refactors runDataflowVertexPass to attempt a single batch NAPI call, falling back to per-file extraction when the export is absent or throws. Both code paths now consistently route unreadable/empty files to wasmStubs, resolving the previous divergence comment.
src/features/dataflow.ts Adds a Map cache inside makeNodeResolver. The null/undefined distinction is correctly handled: cache.get() returns undefined on miss and null on "not found", so the !== undefined guard avoids re-querying without incorrectly short-circuiting on cached nulls.
src/types.ts Adds the optional extractDataflowAnalysisBatch? method to NativeAddon, correctly typed as optional to support older addon versions without the export.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[runDataflowVertexPass] --> B{extractDataflowAnalysisBatch\navailable?}
    B -- yes --> C[Call batch NAPI\none round-trip]
    C --> D{batch threw?}
    D -- yes --> E[batchResults = null\nfall through]
    D -- no --> F[batchResults array]
    B -- no --> E

    E --> G[Per-file loop\nreadFileSafe + extractDataflowAnalysis]
    F --> H[Per-file loop\nindex into batchResults]

    G --> I{result?}
    H --> I

    I -- null/None --> J[wasmStubs.set]
    I -- DataflowResult --> K[patchDataflowResult\nnativeDataflow.set]

    K --> L[buildDataflowVerticesFromMap\nwith memoised makeNodeResolver]
    J --> M[WASM handles\nedges + vertices]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[runDataflowVertexPass] --> B{extractDataflowAnalysisBatch\navailable?}
    B -- yes --> C[Call batch NAPI\none round-trip]
    C --> D{batch threw?}
    D -- yes --> E[batchResults = null\nfall through]
    D -- no --> F[batchResults array]
    B -- no --> E

    E --> G[Per-file loop\nreadFileSafe + extractDataflowAnalysis]
    F --> H[Per-file loop\nindex into batchResults]

    G --> I{result?}
    H --> I

    I -- null/None --> J[wasmStubs.set]
    I -- DataflowResult --> K[patchDataflowResult\nnativeDataflow.set]

    K --> L[buildDataflowVerticesFromMap\nwith memoised makeNodeResolver]
    J --> M[WASM handles\nedges + vertices]
Loading

Reviews (3): Last reviewed commit: "fix(native): align fallback-path unreada..." | Re-trigger Greptile

@github-actions

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

3 functions changed11 callers affected across 6 files

  • runDataflowVertexPass in src/domain/graph/builder/stages/native-orchestrator.ts:313 (4 transitive callers)
  • makeNodeResolver in src/features/dataflow.ts:740 (7 transitive callers)
  • NativeAddon.extractDataflowAnalysisBatch in src/types.ts:2296 (0 transitive callers)

…batch path

In the per-file fallback branch of runDataflowVertexPass, an unreadable
file would throw uncaught from readFileSafe and an empty file would be
silently skipped — neither was added to wasmStubs. The batch path
correctly routes both cases to WASM via a null positional result.

Wrap readFileSafe in a try/catch and replace the bare `continue` for
empty files so both conditions add the file to wasmStubs, matching the
batch-path contract.

Impact: 1 functions changed, 4 affected
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed in fbff44a. The fallback branch in runDataflowVertexPass now wraps readFileSafe in a try/catch and routes unreadable files to wasmStubs (same as the batch path). Empty files (!source) are also routed to wasmStubs instead of being silently skipped. Both edge cases now behave identically between the batch and per-file fallback paths.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit ad6cd86 into main Jun 23, 2026
29 checks passed
@carlos-alm carlos-alm deleted the perf/native-fullbuild-3.15 branch June 23, 2026 21:14
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant