Reset LSPS5 `persistence_in_flight` counter on persist errors by tnull · Pull Request #4597 · lightningdevkit/rust-lightning

tnull · 2026-05-05T20:14:27Z

LSPS5ServiceHandler::persist incremented persistence_in_flight at the top as a single-runner gate, but only decremented it on the success path: each interior ? on a kv_store future propagated the error out of the function while leaving the counter at >= 1. After one transient I/O failure (disk full, brief unavailability of a remote KVStore, EPERM, etc.) every subsequent persist() call hit the fetch_add > 0 short-circuit and silently returned Ok(false).

The in-memory needs_persist flags then continued to grow without ever reaching disk, so webhook state, removals, and notification cooldowns were lost on the next process restart — including the spec-mandated webhook retention/pruning state — without any error surfaced to the operator. The counter is monotonic, so recovery required a process restart.

Adopt the LSPS1 / LSPS2 pattern: split the body into an inner do_persist and an outer persist that unconditionally clears the counter via store(0) after the call returns, regardless of outcome. A failed write now still propagates Err, but the next persist() attempt actually retries the write instead of no-op'ing.

Co-Authored-By: HAL 9000

ldk-reviews-bot · 2026-05-05T20:14:30Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

ldk-claude-review-bot · 2026-05-05T20:19:40Z

I've reviewed the full diff, comparing the new LSPS5 persist/do_persist structure against the LSPS1 and LSPS2 reference implementations. Let me trace all paths through the counter logic.

LSPS5 persist() paths:

Early return (fetch_add > 0): counter correctly left for the running thread to manage.
do_persist() returns Err: explicit store(0) before returning — correct.
do_persist() returns Ok, fetch_sub(1) returns 1: counter is now 0, break — correct.
do_persist() returns Ok, fetch_sub(1) returns > 1: store(1), loop again — correct.

All paths correctly reset the counter. The fix is sound.

No issues found.

The code correctly ensures persistence_in_flight is reset to 0 on all exit paths. The structural placement of the loop differs from LSPS1/LSPS2 (loop in persist with store(0) only on error vs. loop in do_persist with unconditional store(0) + debug_assert in persist), but the semantics are equivalent — no counter value can leak. The test properly validates the regression scenario via a FailableKVStore that selectively fails LSPS5 namespace writes.

TheBlueMatt · 2026-05-05T20:26:11Z


+		let res = self.do_persist().await;
+		debug_assert!(res.is_err() || self.persistence_in_flight.load(Ordering::Acquire) == 0);
+		self.persistence_in_flight.store(0, Ordering::Release);


No, this races with a second writer that is started at the same time. We should move the loop out of the inner method and into this method to control the flag entirely in this method.

Added a fixup.

codecov · 2026-05-05T23:46:15Z

Codecov Report

❌ Patch coverage is 46.15385% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.13%. Comparing base (1a26867) to head (b3544de).
⚠️ Report is 61 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning-liquidity/src/lsps5/service.rs	46.15%	24 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4597      +/-   ##
==========================================
- Coverage   86.84%   86.13%   -0.71%     
==========================================
  Files         161      157       -4     
  Lines      109260   108833     -427     
  Branches   109260   108833     -427     
==========================================
- Hits        94882    93743    -1139     
- Misses      11797    12477     +680     
- Partials     2581     2613      +32

Flag	Coverage Δ
fuzzing-fake-hashes	`?`
fuzzing-real-hashes	`?`
tests	`86.13% <46.15%> (-0.09%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TheBlueMatt

please squash

ldk-reviews-bot · 2026-05-07T21:35:57Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

`LSPS5ServiceHandler::persist` incremented `persistence_in_flight` at the top as a single-runner gate, but only decremented it on the success path: each interior `?` on a `kv_store` future propagated the error out of the function while leaving the counter at >= 1. After one transient I/O failure (disk full, brief unavailability of a remote `KVStore`, EPERM, etc.) every subsequent `persist()` call hit the `fetch_add > 0` short-circuit and silently returned `Ok(false)`. The in-memory `needs_persist` flags then continued to grow without ever reaching disk, so webhook state, removals, and notification cooldowns were lost on the next process restart — including the spec-mandated webhook retention/pruning state — without any error surfaced to the operator. The counter is monotonic, so recovery required a process restart. Adopt the LSPS1 / LSPS2 pattern: split the body into an inner `do_persist` and an outer `persist` that unconditionally clears the counter via `store(0)` after the call returns, regardless of outcome. A failed write now still propagates `Err`, but the next `persist()` attempt actually retries the write instead of no-op'ing. Co-Authored-By: HAL 9000

tnull · 2026-05-08T09:15:32Z

please squash

Squashed without further changes.

TheBlueMatt · 2026-06-11T20:51:03Z

Backported to 0.2 in #4683.

TheBlueMatt · 2026-06-17T20:06:49Z

LSPS5 was not implemented in 0.1 so no need to backport.

v0.2.3 - Jun 18, 2026 - "Through the Loupe" API Updates =========== * `DefaultMessageRouter` will now always generate blinded message paths that provide no privacy (where our node is the introduction node) for nodes with public channels. This works around an issue which will appear for any nodes with LND peers that enable onion messaging - such peers will refuse to forward BOLT 12 messages from unknown third parties, which most BOLT 12 payers rely on today (lightningdevkit#4647). * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder` now maps 0-amounts to an amount of `None` (lightningdevkit#4324). Bug Fixes ========= * `Features::supports_zero_conf` no longer clears the `ZeroConf` features and `Features::requires_zero_conf` now correctly reports required, rather than supported, status (lightningdevkit#4517). * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are still being completed asynchronously, further channel updates (e.g. forwarding another payment) are pending and the node restarts, the channel could have become stuck (lightningdevkit#4520). * The presence of unconfirmed transactions actually no longer causes `ElectrumSyncClient` to spuriously fail to sync (lightningdevkit#4590). * LSPS1, LSPS2, and LSPS5 persistence will no longer get stuck and refuse to persist again after a single failure from the KVStore (lightningdevkit#4597, lightningdevkit#4282). * Dropping the future returned by `OutputSweeper::regenerate_and_broadcast_spend_if_necessary` no longer results in future calls to the same method being spuriously ignored (lightningdevkit#4598). * Used async-receive offers are no longer refreshed on every timer tick once their refresh time is reached (lightningdevkit#4672). * `FilesystemStore::list_all_keys` will no longer fail if there are stale intermediate files lying around from a previous unclean shutdown (lightningdevkit#4618). * When forwarding an HTLC while in a blinded path with proportional fees over 200%, LDK will no longer spuriously allow a forward that pays us 1 msat too little in fees (lightningdevkit#4697). * Fixed a rare case where a channel could get stuck on reconnect when using both async `ChannelMonitorUpdate` persistence and async signing (lightningdevkit#4684). * If we had exactly zero balance in a zero-fee-commitment channel, the counterparty was able to splice all of their balance out, violating the reserve requirements they'd otherwise be forced to keep (lightningdevkit#4580). * Providing an `Event::HTLCIntercepted` to the `LSPS2ServiceHandler` twice no longer results in spuriously opening a channel early (lightningdevkit#4656). * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where `ChannelManager::abandon_payment` was called before the payment ultimately completes anyway (lightningdevkit#4651). * `AnchorDescriptor::previous_utxo` now provides the correct `script_pubkey` for non-zero-commitment-fee anchor channels (lightningdevkit#4669). * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some full `ChannelMonitor`s to disk several times per block (lightningdevkit#4544). * `OMDomainResolver` now correctly accounts for failed queries when rate limiting, ensuring we continue to respond to queries after failures (lightningdevkit#4591). * Calling `ChannelManager::send_payment_with_route` without a `route_params` and with an invalid `Route` will no longer panic (lightningdevkit#4707). * `LSPS2ServiceHandler::channel_open_failed` now correctly fails intercepted HTLCs rather than allowing them to fail just before expiry (lightningdevkit#4677). * `StaticInvoice::is_offer_expired` was corrected to check offer, rather than static invoice, expiry (lightningdevkit#4594). * `lightning-custom-message`'s handling of `peer_connected` events now ensures that sub-handlers will see a `peer_disconnected` event if a different sub-handler refused the connection by `Err`ing `peer_connected` (lightningdevkit#4595). * Replay protection for LSPS5 signatures now detects replays which are only different in the encoded signature's case (lightningdevkit#4701). * When `lightning-liquidity` is configured in the background processor, there is no longer a stream of `Persisting LiquidityManager...` log spam (lightningdevkit#4246). * Incomplete MPP keysend payments will no longer see their HTLCs held until expiry (lightningdevkit#4558). * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a BOLT 12 `Offer`, allowing any quantity up to a bound (lightningdevkit#4667). * `lightning-custom-message` handlers that return `Ok(None)` when asked to deserialize a message in their defined range no longer cause panics (lightningdevkit#4709). * Several spurious debug assertions were fixed (lightningdevkit#4537, lightningdevkit#4618, lightningdevkit#4026) Security ======== 0.2.3 fixes several underestimates of the anchor reserves required to ensure we can reliably close channels, several denial-of-service vulnerabilities and a sanitization issue. * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an invoice which set an explicit public key, rather than relying on public key recovery. Note that this method is called from `PaymentParameters::from_bolt11_invoice` (lightningdevkit#4717). * Maliciously-crafted unpayable invoices which have overflowing feerates will no longer cause an `unwrap` failure panic (lightningdevkit#4716). * Parsing an `LSPSDateTime` which is before 1970 no longer panics. This is reachable when parsing messages from counterparties (lightningdevkit#4715). * `possiblyrandom` did not properly generate random data except when it was explicitly configured to. By default this means LDK is vulnerable to various HashDoS attacks (lightningdevkit#4719). * `OMNameResolver` will no longer panic when looking up payment instructions which include unicode characters at the start of a TXT record (lightningdevkit#4718). * When using the `anchor_channel_reserves` module to calculate reserves required to pay for fees when closing anchor channels, zero-fee-commitment channels were not considered. This could allow a counterparty to open many channels, leaving us unable to properly force-close (lightningdevkit#4592). * The `anchor_channel_reserves` module overestimated the value of `Utxo`s in the wallet by ignoring the `TxIn` cost to spend them (lightningdevkit#4670). * `PrintableString` did not properly sanitize unicode format characters, allowing an attacker to corrupt the rendering of logs or UI (lightningdevkit#4593, lightningdevkit#4605). * RGS data is now limited in how large of a graph it is able to cause a client to store in memory. Note that RGS data is still considered a DoS vector in general and you should only use semi-trusted RGS data (lightningdevkit#4713). * Counterparty-provided strings in failure messages are no longer logged in full, reducing the ability of such a counterparty to spam our logs (lightningdevkit#4714). * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer cause us to allocate large amounts of memory (lightningdevkit#4712). Thanks to Project Loupe for reporting most of the issues fixed in this release. Conflicts resolved in: * lightning/src/chain/channelmonitor.rs * lightning/src/events/mod.rs * lightning/src/ln/channelmanager.rs * lightning/src/ln/mod.rs * lightning/src/ln/offers_tests.rs * lightning/src/ln/onion_utils.rs

tnull added backport 0.1 backport 0.2 labels May 5, 2026

ldk-reviews-bot requested a review from joostjager May 5, 2026 20:25

TheBlueMatt reviewed May 5, 2026

View reviewed changes

TheBlueMatt removed the request for review from joostjager May 6, 2026 01:05

tnull requested a review from TheBlueMatt May 6, 2026 09:27

TheBlueMatt approved these changes May 7, 2026

View reviewed changes

tnull force-pushed the 2026-05-lsps5-persist-counter-leak branch from a23feec to b3544de Compare May 8, 2026 09:15

TheBlueMatt approved these changes May 18, 2026

View reviewed changes

TheBlueMatt merged commit 39434df into lightningdevkit:main May 18, 2026
23 of 24 checks passed

TheBlueMatt mentioned this pull request Jun 11, 2026

[0.2] Initial batch of 0.2.3 backports #4683

Merged

TheBlueMatt removed the backport 0.2 label Jun 11, 2026

TheBlueMatt removed the backport 0.1 label Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset LSPS5 `persistence_in_flight` counter on persist errors#4597

Reset LSPS5 `persistence_in_flight` counter on persist errors#4597
TheBlueMatt merged 1 commit into
lightningdevkit:mainfrom
tnull:2026-05-lsps5-persist-counter-leak

tnull commented May 5, 2026

Uh oh!

ldk-reviews-bot commented May 5, 2026 •

edited

Loading

Uh oh!

ldk-claude-review-bot commented May 5, 2026 •

edited

Loading

Uh oh!

TheBlueMatt May 5, 2026

Uh oh!

tnull May 6, 2026

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

TheBlueMatt left a comment

Uh oh!

ldk-reviews-bot commented May 7, 2026

Uh oh!

tnull commented May 8, 2026

Uh oh!

Uh oh!

TheBlueMatt commented Jun 11, 2026

Uh oh!

TheBlueMatt commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tnull commented May 5, 2026

Uh oh!

ldk-reviews-bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-claude-review-bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt May 5, 2026

Choose a reason for hiding this comment

Uh oh!

tnull May 6, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented May 7, 2026

Uh oh!

tnull commented May 8, 2026

Uh oh!

Uh oh!

TheBlueMatt commented Jun 11, 2026

Uh oh!

TheBlueMatt commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ldk-reviews-bot commented May 5, 2026 •

edited

Loading

ldk-claude-review-bot commented May 5, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading