feat: implement nuget registry worker [CM-1276]#4265
Conversation
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
|
|
There was a problem hiding this comment.
Pull request overview
This PR adds first-class NuGet package enrichment to the packages_worker service, including a new Temporal worker entrypoint, NuGet API client + normalization pipeline, DAL upsert helpers, and a schema change to store NuGet total download counts.
Changes:
- Added a NuGet worker (schedule, workflow, activities, CLI triggers) to ingest NuGet package metadata into
packages-db. - Implemented NuGet registry client + normalization and wired it into a batch enrichment loop with maintainers, versions, and download snapshot handling.
- Added
packages.total_downloadsvia a migration and exposed NuGet DAL helpers via the data-access-layer index.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/data-access-layer/src/osspckgs/nuget.ts | New DAL helpers for listing NuGet packages to sync, upserting package/version metadata, and recording download snapshots. |
| services/libs/data-access-layer/src/index.ts | Exports NuGet DAL module. |
| services/apps/packages_worker/src/workflows/index.ts | Exposes NuGet ingestion workflow from the worker’s workflow index. |
| services/apps/packages_worker/src/scripts/triggerOsvSync.ts | Adds a manual script to trigger the OSV sync workflow. |
| services/apps/packages_worker/src/scripts/triggerNuGetSync.ts | Adds a manual script to trigger NuGet ingestion. |
| services/apps/packages_worker/src/nuget/workflows.ts | NuGet Temporal workflow that continues-as-new while work remains. |
| services/apps/packages_worker/src/nuget/types.ts | NuGet config + API/normalization types and helpers. |
| services/apps/packages_worker/src/nuget/schedule.ts | Registers a daily Temporal Schedule for NuGet ingestion. |
| services/apps/packages_worker/src/nuget/runNuGetEnrichmentLoop.ts | Implements the NuGet batch processing/enrichment loop. |
| services/apps/packages_worker/src/nuget/normalize.ts | Normalizes NuGet search/registration responses into internal package/version shapes. |
| services/apps/packages_worker/src/nuget/client.ts | Resolves NuGet service index endpoints and fetches search/registration data. |
| services/apps/packages_worker/src/nuget/activities.ts | Temporal activity wrapper for batch processing. |
| services/apps/packages_worker/src/config.ts | Adds NuGet worker configuration (batch size, concurrency, delays, UA, critical-only). |
| services/apps/packages_worker/src/bin/nuget-worker.ts | Worker entrypoint to init service, register schedule, and start. |
| services/apps/packages_worker/src/activities.ts | Exports processNuGetBatch activity. |
| services/apps/packages_worker/package.json | Adds start/dev scripts for the new nuget-worker. |
| scripts/services/nuget-worker.yaml | Docker compose service definitions for NuGet worker (prod/dev). |
| scripts/builders/packages.env | Adds nuget-worker to packaged services list. |
| backend/src/osspckgs/migrations/V1782345600__nuget_total_downloads.sql | Adds packages.total_downloads column for NuGet download counts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
| if (registrationResult.kind === 'RATE_LIMIT') { | ||
| log.warn({ purl: pkg.purl }, 'Rate limited by NuGet registry — will retry next pass') | ||
| return 'error' | ||
| } |
| } catch (err) { | ||
| const message = err instanceof Error ? err.message : String(err) | ||
| log.error({ purl: pkg.purl, error: message }, 'Unexpected error processing package') | ||
| counts.error++ | ||
| } |
| headers: { | ||
| 'Accept-Encoding': 'gzip', | ||
| }, | ||
| timeout: 15000, |
| const resp = await axios.get<NuGetRegistrationPage>(pageId, { | ||
| headers: { 'Accept-Encoding': 'gzip' }, | ||
| timeout: 15000, | ||
| }) |
| export function normalizeNuGetPackage( | ||
| packageId: string, | ||
| searchResult: NuGetSearchItem | null, | ||
| registration: NuGetRegistrationIndex, | ||
| ): NormalizedNuGetPackage { |
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 3036aaa. Configure here.
| const delta = totalDownloads - prev | ||
| const dailyChanged = await insertDailyDownloads(qx, String(packageId), [ | ||
| { day: today, downloads: delta }, | ||
| ]) |
There was a problem hiding this comment.
Same-day download delta dropped
Medium Severity
When recordNuGetDownloadSnapshot sees a higher NuGet total than the stored total_downloads, it inserts the difference into downloads_daily for today via insertDailyDownloads, which uses ON CONFLICT (package_id, date) DO NOTHING. A second enrichment the same UTC day still bumps packages.total_downloads, but the extra delta for that date is skipped, so daily and 30-day download rollups stay too low.
Reviewed by Cursor Bugbot for commit 3036aaa. Configure here.
| export function getNuGetConfig() { | ||
| return { | ||
| batchSize: parseInt(process.env.NUGET_FETCHER_BATCH_SIZE ?? '1000', 10), | ||
| concurrency: parseInt(process.env.NUGET_FETCHER_CONCURRENCY ?? '20', 10), | ||
| groupDelayMs: parseInt(process.env.NUGET_FETCHER_GROUP_DELAY_MS ?? '0', 10), | ||
| isCritical: (process.env.NUGET_FETCHER_IS_CRITICAL ?? 'false') === 'true', | ||
| } | ||
| } |
| headers: { | ||
| 'Accept-Encoding': 'gzip', | ||
| }, | ||
| timeout: 15000, |
| const resp = await axios.get<NuGetRegistrationPage>(pageId, { | ||
| headers: { 'Accept-Encoding': 'gzip' }, | ||
| timeout: 15000, | ||
| }) |
| const resp = await axios.get<NuGetRegistrationIndex>( | ||
| `${registrationBaseUrl}${lowerId}/index.json`, | ||
| { | ||
| headers: { 'Accept-Encoding': 'gzip' }, | ||
| timeout: 15000, | ||
| }, | ||
| ) |
| const thirtyDaysAgo = new Date(today) | ||
| thirtyDaysAgo.setDate(thirtyDaysAgo.getDate() - 29) | ||
| const startDate = thirtyDaysAgo.toISOString().split('T')[0] | ||
|
|
| import { TEMPORAL_CONFIG, getTemporalClient } from '@crowd/temporal' | ||
|
|
||
| import { osvSync } from '../osv/workflows' | ||
|
|
||
| async function main(): Promise<void> { | ||
| const raw = process.argv[2] | ||
| const ecosystems = raw ? raw.split(',').map((e) => e.trim()) : ['cargo'] | ||
|
|


This pull request introduces support for NuGet package enrichment in the
packages_workerservice. It adds a new NuGet worker with all necessary scripts, configuration, and logic to fetch, normalize, and store NuGet package metadata, including download counts and maintainers. Additionally, it updates the database schema to support tracking total downloads for packages.NuGet worker integration and orchestration:
nuget-workerservice, including Docker Compose configuration (scripts/services/nuget-worker.yaml), and updated the build and service scripts to include the NuGet worker (scripts/builders/packages.env,services/apps/packages_worker/package.json). [1] [2] [3]processNuGetBatchactivity in the worker's activities export (services/apps/packages_worker/src/activities.ts).NuGet enrichment logic:
services/apps/packages_worker/src/nuget/runNuGetEnrichmentLoop.ts,services/apps/packages_worker/src/nuget/activities.ts). [1] [2]services/apps/packages_worker/src/nuget/client.ts).services/apps/packages_worker/src/nuget/normalize.ts).Database schema changes:
total_downloadscolumn to thepackagestable to track NuGet download counts (backend/src/osspckgs/migrations/V1782345600__nuget_total_downloads.sql).Configuration:
services/apps/packages_worker/src/config.ts).Note
Medium Risk
Writes at scale to shared
packages/versionstables and depends on nuget.org rate limits, though behavior mirrors other ecosystem workers and uses transactional upserts with deadlock retries.Overview
Adds a NuGet registry worker to
packages_workerthat enriches existingnugetpackage rows from nuget.org on a daily Temporal schedule (ingestNuGetPackageswithcontinueAsNewbatches).Each batch loads stale NuGet packages from the DB, fetches search + registration metadata concurrently, normalizes versions/licenses/repo/maintainers, and upserts into
packages,versions, maintainer links, and audit logs. 404 and 429 responses are handled withnuget_not_found/nuget_erroringestion outcomes for retries.Download tracking adds
packages.total_downloads(migration), derives daily deltas when totals increase, and refreshes last-30-day download rollups via existing DAL helpers.Deployment wiring includes
nuget-workerin the packages build list, Docker Compose service, npmstart/dev/trigger-nugetscripts, and a newosspckgs/nugetdata-access module. A manualtriggerOsvSyncscript is also added alongside the NuGet trigger.Reviewed by Cursor Bugbot for commit 3036aaa. Bugbot is set up for automated code reviews on this repo. Configure here.