Skip to content

feat(vi/cvi): recover VirtualImage and ClusterVirtualImage from ImageLost when data returns#2564

Open
danilrwx wants to merge 2 commits into
mainfrom
feat/imagelost-auto-recovery
Open

feat(vi/cvi): recover VirtualImage and ClusterVirtualImage from ImageLost when data returns#2564
danilrwx wants to merge 2 commits into
mainfrom
feat/imagelost-auto-recovery

Conversation

@danilrwx

@danilrwx danilrwx commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Description

Automatically recover VirtualImage and ClusterVirtualImage that entered the ImageLost phase (image is missing in DVCR) without re-importing the data.

The ImagePresenceHandler already moved a Ready image to ImageLost when it disappeared from DVCR. It is now symmetric and also handles the reverse transition:

  • while the image is ImageLost, the handler keeps polling DVCR (RequeueAfter), so the return of the data is noticed shortly after it happens;
  • once the image reappears in DVCR (for example, when the DVCR backing PVC is remounted after a temporary loss), the resource is restored to Ready — only the phase and the Ready condition are flipped, the rest of the status (Target, Size, Format, …) was never cleared;
  • a VirtualImageLostRecovered / ClusterVirtualImageLostRecovered event is emitted on recovery.

Because recovery reuses the blobs already present in DVCR, no re-download happens and every source recovers, including Upload, whose data cannot be re-fetched.

The LifeCycleHandler early-returns on ImageLost as before (no import restart). Handler order (LifeCycleHandlerImagePresenceHandler) is unchanged and the ImagePVCLost phase is not affected. Healthy Ready images are not polled — loss detection stays event-driven as before; polling is added only while a resource is lost.

Why do we need it, and what problem does it solve?

When the DVCR backing storage (its PVC) is temporarily lost and later comes back, all images served by DVCR are physically intact — only the registry was unavailable for a while. Today VI/CVI correctly move to ImageLost during the outage, but then stay stuck there even though the blobs returned, forcing users to delete and recreate the resources by hand.

This is especially painful for a mass DVCR outage affecting dozens of images at once. Re-importing is both unnecessary (the data is already there) and impossible for Upload sources. Flipping back to Ready when the data reappears restores the images automatically, with no re-download and no user action.

What is the expected result?

  1. Create a VI/CVI backed by DVCR, wait for Ready.
  2. Make DVCR lose the data (detach/remove the DVCR PVC, or delete the blob), wait for the resource to enter ImageLost.
  3. Bring the data back (remount the DVCR PVC / restore the blob).
  4. Within the recheck interval the resource returns to Ready on its own; a VirtualImageLostRecovered / ClusterVirtualImageLostRecovered event is recorded. This works for every source type, including Upload.
  5. While the data is still missing, the resource stays in ImageLost and DVCR keeps being rechecked periodically.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: core
type: feature
summary: Automatically restore VirtualImage and ClusterVirtualImage from the ImageLost phase once the image reappears in DVCR, without re-importing the data.

@danilrwx danilrwx marked this pull request as ready for review July 1, 2026 13:41
@danilrwx danilrwx added this to the v1.10.0 milestone Jul 1, 2026
@danilrwx danilrwx changed the title feat: auto-recover VirtualImage and ClusterVirtualImage from ImageLost feat(vi/cvi): auto-recover VirtualImage and ClusterVirtualImage from ImageLost Jul 1, 2026
Restart the import process when a Ready image is lost in DVCR, for
recoverable data sources (HTTP, ContainerImage, ObjectRef). Upload
images stay in ImageLost since their data cannot be re-fetched.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
@danilrwx danilrwx force-pushed the feat/imagelost-auto-recovery branch from 338e9b5 to 9454c18 Compare July 1, 2026 14:00
@danilrwx danilrwx requested a review from loktev-d July 1, 2026 20:05
@danilrwx danilrwx changed the title feat(vi/cvi): auto-recover VirtualImage and ClusterVirtualImage from ImageLost feat(vi/cvi): recover VirtualImage and ClusterVirtualImage from ImageLost when data returns Jul 2, 2026
…n data returns

Instead of re-importing a lost image, poll DVCR while the image is in
ImageLost and restore it to Ready once the data reappears (for example,
when the DVCR PVC is remounted). No re-download, so upload-sourced images
recover too.

Replaces the previous restart-import recovery approach.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>

chore: drop stray comments from image presence handlers

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
@danilrwx danilrwx force-pushed the feat/imagelost-auto-recovery branch from d98ea7f to 3d3798d Compare July 2, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant