perf(native): fix full-build regression in the P6 dataflow vertex pass#1700
Conversation
The native full-build's P6 dataflow vertex pass (runDataflowVertexPass → buildDataflowVerticesFromMap) dominates wall-clock on a full build but ran untimed, so it never surfaced in result.phases. Profiling the v3.15.0 native addon on codegraph's own source (631 files, matching the publish-gate corpus) showed the per-file vertex loop alone at ~840ms. Root cause: makeNodeResolver runs one or two `nodes`-table queries per call and is invoked repeatedly for the SAME function names within each file — once per param, return, assignment, argFlow, summary row, and capture. With ~6700 argFlow candidates plus params/returns across 416 dataflow files, that is tens of thousands of redundant queries. The inefficiency was tolerable at the 3.13.0 data volume but was amplified as the dynamic-call resolution phases (0–6) added more tracked dataflow, contributing to the native full-build regression caught by the pre-publish benchmark gate (3012 → 4504 ms). Fix: memoise the resolver per (relPath, funcName). The nodes table is never mutated during the P6 vertex pass (only dataflow* tables are written), so the lookup is stable for the resolver's lifetime and the cache is result-preserving. Local A/B on an idle machine: full build 3430 → 3044 ms; vertex loop ~840 → ~390 ms. All 136 dataflow tests pass; vertex/edge counts unchanged. Impact: 1 functions changed, 7 affected
The native orchestrator's P6 vertex pass re-derives a DataflowResult for every dataflow-bearing file on a full build (the Rust orchestrator writes dataflow edges but not vertices). It did this by calling extractDataflowAnalysis once per file — hundreds of serial parses on the JS event loop, the single largest component of the native full-build benchmark regression after the resolver memoisation in the previous commit. Add extract_dataflow_analysis_batch to the addon: it reads and parses every requested file across the rayon thread pool in one NAPI call (each parse_source builds its own tree-sitter Parser, so the work is embarrassingly parallel) and returns results positionally. runDataflowVertexPass feature-detects the export and calls it once, mapping results back onto its file list exactly as before; older published addons that predate the export fall back to the per-file path. Vertex building stays entirely in TS (shared with the WASM engine), so there is no second implementation to keep in parity — the batch only changes how the DataflowResults are obtained, not how vertices/edges are built from them. Verified result-preserving: a same-process A/B (batch vs per-file) produces byte-identical dataflow output on codegraph's own corpus — dataflow_vertices, dataflow_summary, and every dataflow edge kind (arg_in/def_use/return_out/…) match, as does the inter-procedural edge semantic fingerprint. All 136 dataflow tests pass. Local 4-way A/B (idle machine, median of 6): full build 3215 → 2487 ms with memoisation + batch combined (−22.6%); batch alone −336 ms. Impact: 2 functions changed, 4 affected
Greptile SummaryThis PR fixes a +50% native full-build regression in the P6 dataflow vertex pass by eliminating two dominant costs: redundant
Confidence Score: 5/5Safe to merge — both optimisations are narrowly scoped, the batch path is feature-detected with a clean fallback, and corpus-level A/B testing confirmed byte-identical output. The changes are localised: memoisation only affects query count, not query results; the Rust batch function is a pure map over the existing single-file extractor; and the fallback path now correctly mirrors batch semantics for edge cases. No mutations to existing data contracts, and the type system enforces the optional-export contract. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[runDataflowVertexPass] --> B{extractDataflowAnalysisBatch\navailable?}
B -- yes --> C[Call batch NAPI\none round-trip]
C --> D{batch threw?}
D -- yes --> E[batchResults = null\nfall through]
D -- no --> F[batchResults array]
B -- no --> E
E --> G[Per-file loop\nreadFileSafe + extractDataflowAnalysis]
F --> H[Per-file loop\nindex into batchResults]
G --> I{result?}
H --> I
I -- null/None --> J[wasmStubs.set]
I -- DataflowResult --> K[patchDataflowResult\nnativeDataflow.set]
K --> L[buildDataflowVerticesFromMap\nwith memoised makeNodeResolver]
J --> M[WASM handles\nedges + vertices]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[runDataflowVertexPass] --> B{extractDataflowAnalysisBatch\navailable?}
B -- yes --> C[Call batch NAPI\none round-trip]
C --> D{batch threw?}
D -- yes --> E[batchResults = null\nfall through]
D -- no --> F[batchResults array]
B -- no --> E
E --> G[Per-file loop\nreadFileSafe + extractDataflowAnalysis]
F --> H[Per-file loop\nindex into batchResults]
G --> I{result?}
H --> I
I -- null/None --> J[wasmStubs.set]
I -- DataflowResult --> K[patchDataflowResult\nnativeDataflow.set]
K --> L[buildDataflowVerticesFromMap\nwith memoised makeNodeResolver]
J --> M[WASM handles\nedges + vertices]
Reviews (3): Last reviewed commit: "fix(native): align fallback-path unreada..." | Re-trigger Greptile |
Codegraph Impact Analysis3 functions changed → 11 callers affected across 6 files
|
…batch path In the per-file fallback branch of runDataflowVertexPass, an unreadable file would throw uncaught from readFileSafe and an empty file would be silently skipped — neither was added to wasmStubs. The batch path correctly routes both cases to WASM via a null positional result. Wrap readFileSafe in a try/catch and replace the bare `continue` for empty files so both conditions add the file to wasmStubs, matching the batch-path contract. Impact: 1 functions changed, 4 affected
|
Fixed in fbff44a. The fallback branch in |
Problem
The pre-publish benchmark gate failed on the native full build regressing 3012 → 4504 ms (+50%) vs the 3.13.0 baseline (threshold +25%). The regression is native-only — WASM full build improved in the same run (14214 → 13107 ms) — and the corpus is smaller (629 vs 695 files), so it's a genuine per-file slowdown, not corpus growth or runner noise.
Root cause
Profiling a freshly-built main native addon on codegraph's own source showed the cost is dominated by the P6 dataflow vertex pass (
runDataflowVertexPass→buildDataflowVerticesFromMap), which runs untimed so it never surfaced inresult.phases. Two inefficiencies, both pre-existing but amplified as the dynamic-call resolution phases (0–6) added more tracked dataflow:nodes-table queries —makeNodeResolverran 1–2 uncached queries per call and was invoked repeatedly for the same function names within each file (per param, return, assignment, argFlow, summary, capture). On a full build that's tens of thousands of redundant queries.DataflowResultfor every dataflow file by callingextractDataflowAnalysisonce per file — hundreds of synchronous parses on the event loop.Fix
(relPath, funcName). Thenodestable is never mutated during P6 (onlydataflow*tables are written), so the lookup is stable for the resolver's lifetime.extract_dataflow_analysis_batch— a new addon export that reads + parses every requested file across the rayon thread pool in one NAPI call (eachparse_sourcebuilds its own tree-sitterParser, so it's embarrassingly parallel).runDataflowVertexPassfeature-detects it and falls back to the per-file path on older addons.Vertex/edge building stays entirely in TS (shared with the WASM engine) — the batch only changes how the
DataflowResults are obtained, not how vertices are built from them, so there is no second implementation to keep in parity.Verification
Result-preserving / parity-safe: a same-process batch-vs-per-file A/B on codegraph's corpus produces byte-identical dataflow output —
dataflow_vertices(12200),dataflow_summary(5839), and every dataflow edge kind (arg_in/def_use/return_out/…) match, as does the inter-procedural edge semantic fingerprint.All 136 dataflow tests pass.
Perf (idle-machine 4-way interleaved A/B, median of 6 full builds):
Applying −22.6% to the CI figure (4504 ms) → ~3486 ms, under the 3765 ms (+25%) threshold. The query-elimination should amplify on CI's slower I/O. The pre-publish gate is the final verifier.