feat(vmpool): add VirtualMachinePool for group VM management#2572
Draft
fl64 wants to merge 25 commits into
Draft
feat(vmpool): add VirtualMachinePool for group VM management#2572fl64 wants to merge 25 commits into
fl64 wants to merge 25 commits into
Conversation
Introduce the VirtualMachinePool API type (namespaced, group virtualization.deckhouse.io/v1alpha2) with the scale and status subresources, generated deepcopy/client/lister/informer code and the CRD manifest. Gate the resource behind the VirtualMachinePool module feature gate (EE/SE+, default off; locked off in CE). No controller behaviour yet — the type and gate are the scaffold for the pool controller. Part of the VirtualMachinePool implementation (ADR: architecture-decision-records dvp/2026-06-29-vmpool.md). Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add the VirtualMachinePool controller skeleton behind the EE build tag
(//go:build EE) and the VirtualMachinePool feature gate: handler-chain
reconciler with an empty chain and a primary watch on the resource. It
is wired into the controller manager through build-tagged enterprise
shims (setup_enterprise_{ee,ce}.go); the CE build compiles a no-op.
No reconcile behaviour yet — replica maintenance, template propagation
and reusable disks land in the follow-up slices.
Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
… tag EE is the default shipped edition (werf.inc.yaml builds with -tags $MODULE_EDITION, default EE), but the unit-test task ran ginkgo without a build tag, so //go:build EE code was never exercised by the unit suite. Run ginkgo with --tags EE so enterprise code and its tests are covered. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add an in-memory, thread-safe expectations tracker (EE) modelled on the Kubernetes ReplicaSet UIDTrackingControllerExpectations: creations are counted, deletions tracked by UID, with a TTL safety valve. The pool reconciler will use it to avoid double-creating anonymous replicas while the informer cache lags behind a Create/Delete. Covered by unit tests (race-clean). Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Implement the pool's core reconcile: list members by the managed pool-uid label + controllerRef, create missing replicas from the template (managed labels + controller ownerReference, GenerateName naming) and remove surplus ones, then publish status (replicas, readyReplicas, selector, Available/Progressing conditions). Every create/delete is guarded by the expectations tracker, and a member VirtualMachine watcher re-enqueues the owning pool and records observed creations/deletions — so a lagging informer cache cannot double-create anonymous replicas. Terminating members count toward a scale-down (invariant 2), so a replica already leaving is not over-replaced. Covered by unit tests (fake client, race-clean). The controller stays behind //go:build EE and the feature gate. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add the required spec.scaleDownPolicy enum (NewestFirst / OldestFirst / Explicit) and honour it when the pool is scaled down anonymously via the scale subresource: NewestFirst removes the youngest replicas first, OldestFirst the oldest, and Explicit removes nothing anonymously (such pools shrink only by addressed removal). The scale-subresource guard that rejects anonymous shrink under Explicit is added next. Covered by unit tests. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add a validating webhook on the virtualmachinepools/scale subresource that rejects a replicas decrease when the pool's scaleDownPolicy is Explicit, pointing the user to scaleDownWith for addressed removal. Growth and no-op scale updates are always allowed. The webhook is registered only in EE builds and self-gates on the VirtualMachinePool feature gate; its ValidatingWebhookConfiguration entry is rendered only when the gate is enabled. Covered by unit tests. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add the VirtualMachinePool meta object and the VirtualMachinePoolScaleDownWith body type (targets to remove) to the subresources.virtualization.deckhouse.io API group, with generated deepcopy/conversion/openapi. This is the type surface for the addressed scale-down handle; the aggregated-apiserver REST storage and wiring follow. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Register the virtualmachinepools resource and its scaleDownWith subresource in the existing aggregated apiserver (group subresources.virtualization.deckhouse.io). The handler validates that every target belongs to the pool, deletes them and atomically decrements spec.replicas on the main resource — bypassing the /scale guard, which is what lets Explicit pools shrink by address. The meta-object itself is not served (Get returns NotFound). Enterprise-only: the REST/storage live under //go:build EE and are wired into the apiserver group through a build-tagged hook; the CE build adds nothing. A write-capable client is threaded from the apiserver config. Covered by unit tests. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Let the aggregated apiserver's service account get/update VirtualMachinePool (the scaleDownWith handler decrements spec.replicas) and reach the pool subresources. Grant the Editor cluster role management of VirtualMachinePool, its scale subresource (kubectl scale / HPA) and the scaleDownWith handle for addressed removal. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add the template-hash label (revision marker, not part of the member selector) stamped on every created replica, and report the rollout in status: desiredTemplateHash, updatedReplicas and the Synced condition (True once all live replicas are on the current virtualMachineTemplate). This makes the rollout observable at pool level. In-place patching of existing replicas on a template change follows. Covered by unit tests. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add a template handler that patches each live replica's spec to the current virtualMachineTemplate and marks it on the new revision once applied. Re-patching is avoided with a patched-template-hash annotation (not a spec diff, which the apiserver mutates by defaulting), and the template-hash label is advanced only when the replica is not awaiting a restart, so status.updatedReplicas / restartPendingReplicas and the Synced condition (RolloutInProgress vs RestartPendingApproval) reflect what has effectively landed. Hot/cold is decided by the VM layer. Covered by unit tests. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Replace time.Unix(1_700_000_000, 0) with time.Date(2026, 1, 1, 0, 0, 0, 0, time.UTC) in the pool tests — same deterministic clock, but self-explanatory. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Replace the inline dates with a single documented package-level referenceTime var per test package, and drop the clock/when aliases. A comment states the value is arbitrary — tests use only relative offsets and never read the wall clock — so the real-world date is irrelevant. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add spec.virtualDiskTemplates: each entry describes a per-replica disk with a reclaim policy — Delete (default; the disk belongs to its VirtualMachine and is removed with it) or Retain (the disk belongs to the pool, outlives the replica and is reused on scale-up), plus keep (warm buffer) and ttl for Retain disks. This is the schema for reusable disks; the reconcile behaviour (creation, reuse selection, GC) follows. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add an idempotent, self-healing disks handler: for every live member it ensures each Delete-policy virtualDiskTemplate disk exists (owned by the VirtualMachine, named <vm>-<template>, so it cascades away with the replica) and is referenced in the member's blockDeviceRefs. Also fix the template handler to merge block device refs when it patches a member's spec, so per-replica disk refs the pool attached are not wiped by a template change. Retain (reusable) disks come next. Covered by unit tests, including that a template patch keeps disk refs. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Extend the disks handler to Retain-policy templates: a member reuses a free pool-owned disk of the template (Ready and referenced by no live member) or, if none is free, gets a newly created pool-owned disk (named <pool>-<template>-<rand>) that outlives the replica. A per-pass guard prevents handing the same free disk to two members in one reconcile; the authoritative in-use signal is the members' blockDeviceRefs, not the platform InUse condition. Covered by unit tests (create, reuse-free, skip-busy). Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
The disks handler now ages free Retain disks: it stamps a free-since annotation when a disk leaves every member's blockDeviceRefs (the authoritative free signal — the platform InUse condition is unreliable, it flips on Stop) and clears it on reuse. Disks outside the warm buffer (keep newest) and older than the ttl are deleted with a resourceVersion precondition. free-since is persisted on the disk so the ttl survives controller restarts (in-memory timing would reset every restart and leak disks). Covered by unit tests. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add the fallback for reuse-disk collisions: if two live members reference the same pool-owned disk (a cross-pass race after a controller restart), detach it from all but the keeper (the member with BlockDevicesReady, or the lexicographically smallest name) so the others get a fresh disk on the next reconcile — the in-pass guard already prevents the common case. Also add edge-case tests: a Stopped member is counted and neither replaced nor duplicated (invariant 4); nil replicas mean zero; a non-Ready free disk is not reused; free-since is cleared on reuse; disks are not managed for a Terminating member. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
The virtualization-controller service account could not list/watch VirtualMachinePool, so the pool controller failed to start its watch and never reconciled. Add virtualmachinepools (+ status, + finalizers) to the controller ClusterRole. Found by in-cluster testing. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
The virtualization-api binary was built without -tags $MODULE_EDITION, so the EE-only aggregated-apiserver registration (compiled under //go:build EE) was dropped and the virtualmachinepools/scaleDownWith subresource returned 404. Build the apiserver with the edition tag like the controller, so the enterprise subresource is served in EE builds. Found by in-cluster testing. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Reuse-disk selection required Ready, so a freshly created disk (still WaitForFirstConsumer / provisioning) was never considered free and a new one was created on every reconcile until the first bound — creating a burst of surplus disks. Reuse any free pool-owned disk, preferring a Ready one but otherwise attaching a still-provisioning one (attaching is what makes a WaitForFirstConsumer disk bind), and create a new disk only when none is free. Failed/terminating disks are skipped. Found by in-cluster testing. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
…data The template metadata embedded metav1.ObjectMeta, which controller-gen renders as an opaque object, so setting template.metadata.labels was rejected by strict decoding. Use a curated metadata struct with labels and annotations so the CRD schema exposes them. Found by in-cluster testing. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Emit ReplicaSet-style events on the VirtualMachinePool so scaling is visible in kubectl describe / kubectl get events: SuccessfulCreate / FailedCreate on replica creation and SuccessfulDelete / FailedDelete on removal. FailedCreate surfaces admission errors (e.g. an invalid template) directly on the pool instead of only in controller logs. Messages follow the user-facing text conventions (English, full resource names, no internals). Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Assert SuccessfulCreate is emitted per created replica, and that a failed creation emits FailedCreate and un-does the expectation (via an interceptor client that rejects Create) so the pool is not wedged. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
DVP has no primitive to manage a group of identical virtual machines whose count changes over time. Every "I need N identical VMs and the number varies" scenario — CI runner fleets, VDI desktop pools — is solved with orchestration outside the platform: users write their own controller/scripts that create and delete
VirtualMachines, watch their number, recreate lost ones and clean up after them. This duplicates logic and is error-prone around races and node failures.This PR introduces
VirtualMachinePool(paid editions only, EE/SE+): a namespaced resource that declaratively keeps a requested number of identical VMs and integrates withkubectl scale, HPA and KEDA through the standardscalesubresource. Its template is an ordinaryVirtualMachineSpec, so a replica is no different from a manually created VM.One implementation note: the controller ships only in paid editions (compiled under the
EEbuild tag), while the CRD/API is installed in every edition; the feature gate stays locked off in CE, so the resource simply does nothing there.Why do we need it, and what problem does it solve?
Two mass scenarios suffer most: CI/CD runners (GitLab Runner autoscaling expects a backend that can "give me N more" and reclaim idle ones) and VDI pools (warm desktops that self-heal on node failure). Without a group primitive, DVP cannot serve these natively and each team reinvents the orchestration, usually with bugs in race and failure handling.
VirtualMachinePoolgives users a native, declarative backend for autoscaling fleets of VMs without writing their own replica controller.What is the expected result?
With the
VirtualMachinePoolfeature gate enabled (EE/SE+):VirtualMachinePoolwithspec.replicas: Nand aspec.virtualMachineTemplate— the controller converges the number ofVirtualMachines toN.kubectl scale virtualmachinepool/<name> --replicas=M(or HPA/KEDA) scales the pool toM.Stoppedis kept, not duplicated.kubectl get virtualmachinepooland.statusreportreplicas/readyReplicasand theAvailable/Progressingconditions.Checklist
Changelog entries