Bypass monitor sync requests when no partition key given in update_monitor_with_chain_data#4544
Conversation
|
I've assigned @wpaulino as a reviewer! |
|
The The core functional change in Let me verify there aren't any other files changed in this PR that I might be missing. The diff shows two files:
Both have been reviewed. The functional fix is correct. No new bugs introduced. My prior review comments have been addressed (the no-op was fixed). Review SummaryNo new issues found beyond what was previously flagged. The no-op bug from the prior review has been resolved — Cross-cutting concerns (not tied to specific lines)
|
a79c5de to
1a0f03f
Compare
1a0f03f to
2b181dc
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
oof, yea, that's not great. thanks. Trivial so gonna go ahead and land, will backport as well.
NOTE(phlip9): * Significantly reduces # of channel monitor persists during _transacion-only sync_. * Backported from LDK master. See PR lightningdevkit#4544 for details.
|
Backported to 0.1 in #4680. |
|
Backported to 0.2 in #4683. |
v0.1.10 - Jun 18, 2026 - "Loupe de Loupe" API Updates =========== * `DefaultMessageRouter` will now always generate blinded message paths that provide no privacy (where our node is the introduction node) for nodes with public channels. This works around an issue which will appear for any nodes with LND peers that enable onion messaging - such peers will refuse to forward BOLT 12 messages from unknown third parties, which most BOLT 12 payers rely on today (#4647). * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder` now maps 0-amounts to an amount of `None` (#4324). Bug Fixes ========= * Async `ChannelMonitorUpdate` persistence operations which complete, but are not marked as complete in a persisted `ChannelManager` prior to restart, followed immediately by a block connection and then another restart could result in some channel operations hanging leading for force-closures (#4377). * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are still being completed asynchronously, further channel updates (e.g. forwarding another payment) are pending and the node restarts, the channel could have become stuck (#4520). * The presence of unconfirmed transactions actually no longer causes `ElectrumSyncClient` to spuriously fail to sync (#4590). * `FilesystemStore::list_all_keys` will no longer fail if there are stale intermediate files lying around from a previous unclean shutdown (#4618). * When forwarding an HTLC while in a blinded path with proportional fees over 200%, LDK will no longer spuriously allow a forward that pays us 1 msat too little in fees (#4697). * Fixed a rare case where a channel could get stuck on reconnect when using both async `ChannelMonitorUpdate` persistence and async signing (#4684). * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where `ChannelManager::abandon_payment` was called before the payment ultimately completes anyway (#4651). * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some full `ChannelMonitor`s to disk several times per block (#4544). * `OMDomainResolver` now correctly accounts for failed queries when rate limiting, ensuring we continue to respond to queries after failures (#4591). * Calling `ChannelManager::send_payment_with_route` without a `route_params` and with an invalid `Route` will no longer panic (#4707). * `lightning-custom-message`'s handling of `peer_connected` events now ensures that sub-handlers will see a `peer_disconnected` event if a different sub-handler refused the connection by `Err`ing `peer_connected` (#4595). * Incomplete MPP keysend payments will no longer see their HTLCs held until expiry (#4558). * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a BOLT 12 `Offer`, allowing any quantity up to a bound (#4667). * `lightning-custom-message` handlers that return `Ok(None)` when asked to deserialize a message in their defined range no longer cause panics (#4709). * Several spurious debug assertions were fixed (#4537, #4618). Security ======== 0.1.10 fixes a sanitization issue and several denial-of-service vulnerabilities. * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an invoice which set an explicit public key, rather than relying on public key recovery. This method is called from `payment_parameters_from_invoice` and `payment_parameters_from_variable_amount_invoice` (#4717). * Maliciously-crafted unpayable invoices which have overflowing feerates will no longer cause an `unwrap` failure panic (#4716). * `possiblyrandom` did not properly generate random data except when it was explicitly configured to. By default this means LDK is vulnerable to various HashDoS attacks (#4719). * `OMNameResolver` will no longer panic when looking up payment instructions which include unicode characters at the start of a TXT record (#4718). * `PrintableString` did not properly sanitize unicode format characters, allowing an attacker to corrupt the rendering of logs or UI (#4593, #4605). * RGS data is now limited in how large of a graph it is able to cause a client to store in memory. Note that RGS data is still considered a DoS vector in general and you should only use semi-trusted RGS data (#4713). * Counterparty-provided strings in failure messages are no longer logged in full, reducing the ability of such a counterparty to spam our logs (#4714). * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer cause us to allocate large amounts of memory (#4712). Thanks to Project Loupe for reporting most of the issues fixed in this release.
v0.2.3 - Jun 18, 2026 - "Through the Loupe" API Updates =========== * `DefaultMessageRouter` will now always generate blinded message paths that provide no privacy (where our node is the introduction node) for nodes with public channels. This works around an issue which will appear for any nodes with LND peers that enable onion messaging - such peers will refuse to forward BOLT 12 messages from unknown third parties, which most BOLT 12 payers rely on today (lightningdevkit#4647). * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder` now maps 0-amounts to an amount of `None` (lightningdevkit#4324). Bug Fixes ========= * `Features::supports_zero_conf` no longer clears the `ZeroConf` features and `Features::requires_zero_conf` now correctly reports required, rather than supported, status (lightningdevkit#4517). * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are still being completed asynchronously, further channel updates (e.g. forwarding another payment) are pending and the node restarts, the channel could have become stuck (lightningdevkit#4520). * The presence of unconfirmed transactions actually no longer causes `ElectrumSyncClient` to spuriously fail to sync (lightningdevkit#4590). * LSPS1, LSPS2, and LSPS5 persistence will no longer get stuck and refuse to persist again after a single failure from the KVStore (lightningdevkit#4597, lightningdevkit#4282). * Dropping the future returned by `OutputSweeper::regenerate_and_broadcast_spend_if_necessary` no longer results in future calls to the same method being spuriously ignored (lightningdevkit#4598). * Used async-receive offers are no longer refreshed on every timer tick once their refresh time is reached (lightningdevkit#4672). * `FilesystemStore::list_all_keys` will no longer fail if there are stale intermediate files lying around from a previous unclean shutdown (lightningdevkit#4618). * When forwarding an HTLC while in a blinded path with proportional fees over 200%, LDK will no longer spuriously allow a forward that pays us 1 msat too little in fees (lightningdevkit#4697). * Fixed a rare case where a channel could get stuck on reconnect when using both async `ChannelMonitorUpdate` persistence and async signing (lightningdevkit#4684). * If we had exactly zero balance in a zero-fee-commitment channel, the counterparty was able to splice all of their balance out, violating the reserve requirements they'd otherwise be forced to keep (lightningdevkit#4580). * Providing an `Event::HTLCIntercepted` to the `LSPS2ServiceHandler` twice no longer results in spuriously opening a channel early (lightningdevkit#4656). * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where `ChannelManager::abandon_payment` was called before the payment ultimately completes anyway (lightningdevkit#4651). * `AnchorDescriptor::previous_utxo` now provides the correct `script_pubkey` for non-zero-commitment-fee anchor channels (lightningdevkit#4669). * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some full `ChannelMonitor`s to disk several times per block (lightningdevkit#4544). * `OMDomainResolver` now correctly accounts for failed queries when rate limiting, ensuring we continue to respond to queries after failures (lightningdevkit#4591). * Calling `ChannelManager::send_payment_with_route` without a `route_params` and with an invalid `Route` will no longer panic (lightningdevkit#4707). * `LSPS2ServiceHandler::channel_open_failed` now correctly fails intercepted HTLCs rather than allowing them to fail just before expiry (lightningdevkit#4677). * `StaticInvoice::is_offer_expired` was corrected to check offer, rather than static invoice, expiry (lightningdevkit#4594). * `lightning-custom-message`'s handling of `peer_connected` events now ensures that sub-handlers will see a `peer_disconnected` event if a different sub-handler refused the connection by `Err`ing `peer_connected` (lightningdevkit#4595). * Replay protection for LSPS5 signatures now detects replays which are only different in the encoded signature's case (lightningdevkit#4701). * When `lightning-liquidity` is configured in the background processor, there is no longer a stream of `Persisting LiquidityManager...` log spam (lightningdevkit#4246). * Incomplete MPP keysend payments will no longer see their HTLCs held until expiry (lightningdevkit#4558). * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a BOLT 12 `Offer`, allowing any quantity up to a bound (lightningdevkit#4667). * `lightning-custom-message` handlers that return `Ok(None)` when asked to deserialize a message in their defined range no longer cause panics (lightningdevkit#4709). * Several spurious debug assertions were fixed (lightningdevkit#4537, lightningdevkit#4618, lightningdevkit#4026) Security ======== 0.2.3 fixes several underestimates of the anchor reserves required to ensure we can reliably close channels, several denial-of-service vulnerabilities and a sanitization issue. * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an invoice which set an explicit public key, rather than relying on public key recovery. Note that this method is called from `PaymentParameters::from_bolt11_invoice` (lightningdevkit#4717). * Maliciously-crafted unpayable invoices which have overflowing feerates will no longer cause an `unwrap` failure panic (lightningdevkit#4716). * Parsing an `LSPSDateTime` which is before 1970 no longer panics. This is reachable when parsing messages from counterparties (lightningdevkit#4715). * `possiblyrandom` did not properly generate random data except when it was explicitly configured to. By default this means LDK is vulnerable to various HashDoS attacks (lightningdevkit#4719). * `OMNameResolver` will no longer panic when looking up payment instructions which include unicode characters at the start of a TXT record (lightningdevkit#4718). * When using the `anchor_channel_reserves` module to calculate reserves required to pay for fees when closing anchor channels, zero-fee-commitment channels were not considered. This could allow a counterparty to open many channels, leaving us unable to properly force-close (lightningdevkit#4592). * The `anchor_channel_reserves` module overestimated the value of `Utxo`s in the wallet by ignoring the `TxIn` cost to spend them (lightningdevkit#4670). * `PrintableString` did not properly sanitize unicode format characters, allowing an attacker to corrupt the rendering of logs or UI (lightningdevkit#4593, lightningdevkit#4605). * RGS data is now limited in how large of a graph it is able to cause a client to store in memory. Note that RGS data is still considered a DoS vector in general and you should only use semi-trusted RGS data (lightningdevkit#4713). * Counterparty-provided strings in failure messages are no longer logged in full, reducing the ability of such a counterparty to spam our logs (lightningdevkit#4714). * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer cause us to allocate large amounts of memory (lightningdevkit#4712). Thanks to Project Loupe for reporting most of the issues fixed in this release. Conflicts resolved in: * lightning/src/chain/channelmonitor.rs * lightning/src/events/mod.rs * lightning/src/ln/channelmanager.rs * lightning/src/ln/mod.rs * lightning/src/ln/offers_tests.rs * lightning/src/ln/onion_utils.rs
This proposed patch concerns
chain::ChainMonitor.Context
ChainMonitor,process_chain_datatakes abest_height: Option<u32>parameter, which is used as a partition key inupdate_monitor_with_chain_data. Each channel monitor is partitioned by itschannel_idmod50, and only whenbest_heightmatches thechannel_idpartition does a sync request get sent to thepersister.best_block_updated, which "must be called whenever a new chain tip becomes available" (doc comment).best_block_updatedandtransactions_confirmedareConfirmtrait functionsprocess_chain_datais called each timebest_block_updatedis called.process_chain_dataalso supports this intent: "Calls which represent a new blockchain tip height should setbest_height." (doc comment)Issue
update_monitor_with_chain_data, where the partition logic is stored, a partition key/best_heightofNonewill be defaulted to partition key0, triggering unnecessary syncs for partition0of channel monitors.transactions_confirmedis called,process_chain_dataiterates through all channels/monitors, triggering syncs for partition0.lightning-transaction-sync, the user ofChainMonitor'sConfirmtrait, whencommon::sync_confirmed_transactionsis called, each confirmed transaction will trigger a call totransactions_confirmed.We can use some numbers and trace to be really concrete. Suppose we have 250 channels, 10 of which have pending transactions:
lightning-transaction-sync::esplora::sync()withchain_monitorin args (esplora.rsline85)elsebranch, line187:self.get_confirmed_transactions(&sync_state)295-302: Put all unique confirmedtxidintoconfirmed_txs339: Returnconfirmed_txs, which is a 250 longVec(1 transaction per channel)202:sync_state.sync_confirmed_transactions(<chain_monitor>, confirmed_txs)common.rsline75: for eachctxinconfirmed_txs(250), dochain_monitor.transactions_confirmed(<tx info>)chainmonitor.rsline1515:self.process_chain_data(...)484: for each channel (250),self.update_monitor_with_chain_data(<associated monitor/channel/tx data>)(line487)568: Send a persist request if channel monitorhas_pending_claimsOR if channel falls into the partition (1/50 chance)In the case of the application I'm working on, this means that sync is destroying buffers... which is causing problems. 😅
Solution
The problem is that syncs are being scheduled even when they shouldn't be, because a
best_heightofNonecauses partition channels0to sync. So the solution is to skip this syncing step conditional on the presence ofbest_height.Follow-up question
We experience the
NxMamplification issue with pending transactions, since these are persisted everytransactions_confirmedcall as well. Is there a reason why these need to be persisted on every confirmed transaction, or would it be safe to condition that persist request onbest_heightas well?