Skip to content

feat: implement nuget registry worker [CM-1276]#4265

Merged
mbani01 merged 5 commits into
mainfrom
feat/nuget-worker
Jun 26, 2026
Merged

feat: implement nuget registry worker [CM-1276]#4265
mbani01 merged 5 commits into
mainfrom
feat/nuget-worker

Conversation

@mbani01

@mbani01 mbani01 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces support for NuGet package enrichment in the packages_worker service. It adds a new NuGet worker with all necessary scripts, configuration, and logic to fetch, normalize, and store NuGet package metadata, including download counts and maintainers. Additionally, it updates the database schema to support tracking total downloads for packages.

NuGet worker integration and orchestration:

  • Added a new nuget-worker service, including Docker Compose configuration (scripts/services/nuget-worker.yaml), and updated the build and service scripts to include the NuGet worker (scripts/builders/packages.env, services/apps/packages_worker/package.json). [1] [2] [3]
  • Registered the processNuGetBatch activity in the worker's activities export (services/apps/packages_worker/src/activities.ts).

NuGet enrichment logic:

  • Implemented the NuGet batch enrichment loop, including fetching package data from NuGet, normalizing it, and upserting it into the database, with logic for handling maintainers, downloads, and error cases (services/apps/packages_worker/src/nuget/runNuGetEnrichmentLoop.ts, services/apps/packages_worker/src/nuget/activities.ts). [1] [2]
  • Added NuGet API client for resolving endpoints, fetching search and registration data, and handling errors (services/apps/packages_worker/src/nuget/client.ts).
  • Added normalization logic to convert NuGet registry data into the internal package format (services/apps/packages_worker/src/nuget/normalize.ts).

Database schema changes:

  • Added a total_downloads column to the packages table to track NuGet download counts (backend/src/osspckgs/migrations/V1782345600__nuget_total_downloads.sql).

Configuration:

  • Added NuGet-specific configuration options for batch size, concurrency, delays, and user agent (services/apps/packages_worker/src/config.ts).

Note

Medium Risk
Writes at scale to shared packages/versions tables and depends on nuget.org rate limits, though behavior mirrors other ecosystem workers and uses transactional upserts with deadlock retries.

Overview
Adds a NuGet registry worker to packages_worker that enriches existing nuget package rows from nuget.org on a daily Temporal schedule (ingestNuGetPackages with continueAsNew batches).

Each batch loads stale NuGet packages from the DB, fetches search + registration metadata concurrently, normalizes versions/licenses/repo/maintainers, and upserts into packages, versions, maintainer links, and audit logs. 404 and 429 responses are handled with nuget_not_found / nuget_error ingestion outcomes for retries.

Download tracking adds packages.total_downloads (migration), derives daily deltas when totals increase, and refreshes last-30-day download rollups via existing DAL helpers.

Deployment wiring includes nuget-worker in the packages build list, Docker Compose service, npm start/dev/trigger-nuget scripts, and a new osspckgs/nuget data-access module. A manual triggerOsvSync script is also added alongside the NuGet trigger.

Reviewed by Cursor Bugbot for commit 3036aaa. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
@mbani01 mbani01 self-assigned this Jun 25, 2026
Copilot AI review requested due to automatic review settings June 25, 2026 12:33
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class NuGet package enrichment to the packages_worker service, including a new Temporal worker entrypoint, NuGet API client + normalization pipeline, DAL upsert helpers, and a schema change to store NuGet total download counts.

Changes:

  • Added a NuGet worker (schedule, workflow, activities, CLI triggers) to ingest NuGet package metadata into packages-db.
  • Implemented NuGet registry client + normalization and wired it into a batch enrichment loop with maintainers, versions, and download snapshot handling.
  • Added packages.total_downloads via a migration and exposed NuGet DAL helpers via the data-access-layer index.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/osspckgs/nuget.ts New DAL helpers for listing NuGet packages to sync, upserting package/version metadata, and recording download snapshots.
services/libs/data-access-layer/src/index.ts Exports NuGet DAL module.
services/apps/packages_worker/src/workflows/index.ts Exposes NuGet ingestion workflow from the worker’s workflow index.
services/apps/packages_worker/src/scripts/triggerOsvSync.ts Adds a manual script to trigger the OSV sync workflow.
services/apps/packages_worker/src/scripts/triggerNuGetSync.ts Adds a manual script to trigger NuGet ingestion.
services/apps/packages_worker/src/nuget/workflows.ts NuGet Temporal workflow that continues-as-new while work remains.
services/apps/packages_worker/src/nuget/types.ts NuGet config + API/normalization types and helpers.
services/apps/packages_worker/src/nuget/schedule.ts Registers a daily Temporal Schedule for NuGet ingestion.
services/apps/packages_worker/src/nuget/runNuGetEnrichmentLoop.ts Implements the NuGet batch processing/enrichment loop.
services/apps/packages_worker/src/nuget/normalize.ts Normalizes NuGet search/registration responses into internal package/version shapes.
services/apps/packages_worker/src/nuget/client.ts Resolves NuGet service index endpoints and fetches search/registration data.
services/apps/packages_worker/src/nuget/activities.ts Temporal activity wrapper for batch processing.
services/apps/packages_worker/src/config.ts Adds NuGet worker configuration (batch size, concurrency, delays, UA, critical-only).
services/apps/packages_worker/src/bin/nuget-worker.ts Worker entrypoint to init service, register schedule, and start.
services/apps/packages_worker/src/activities.ts Exports processNuGetBatch activity.
services/apps/packages_worker/package.json Adds start/dev scripts for the new nuget-worker.
scripts/services/nuget-worker.yaml Docker compose service definitions for NuGet worker (prod/dev).
scripts/builders/packages.env Adds nuget-worker to packaged services list.
backend/src/osspckgs/migrations/V1782345600__nuget_total_downloads.sql Adds packages.total_downloads column for NuGet download counts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/libs/data-access-layer/src/osspckgs/nuget.ts
Comment thread services/libs/data-access-layer/src/osspckgs/nuget.ts
Comment thread services/libs/data-access-layer/src/osspckgs/nuget.ts
Comment thread services/libs/data-access-layer/src/osspckgs/nuget.ts
mbani01 added 2 commits June 25, 2026 14:31
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings June 25, 2026 13:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 7 comments.

Comment on lines +88 to +91
if (registrationResult.kind === 'RATE_LIMIT') {
log.warn({ purl: pkg.purl }, 'Rate limited by NuGet registry — will retry next pass')
return 'error'
}
Comment on lines +224 to +228
} catch (err) {
const message = err instanceof Error ? err.message : String(err)
log.error({ purl: pkg.purl, error: message }, 'Unexpected error processing package')
counts.error++
}
Comment thread services/apps/packages_worker/src/config.ts
Comment on lines +82 to +85
headers: {
'Accept-Encoding': 'gzip',
},
timeout: 15000,
Comment on lines +101 to +104
const resp = await axios.get<NuGetRegistrationPage>(pageId, {
headers: { 'Accept-Encoding': 'gzip' },
timeout: 15000,
})
Comment thread services/apps/packages_worker/src/nuget/client.ts
Comment on lines +45 to +49
export function normalizeNuGetPackage(
packageId: string,
searchResult: NuGetSearchItem | null,
registration: NuGetRegistrationIndex,
): NormalizedNuGetPackage {
mbani01 and others added 2 commits June 26, 2026 10:37
Signed-off-by: Mouad BANI <mouad-mb@outlook.com>
Copilot AI review requested due to automatic review settings June 26, 2026 10:28
@mbani01 mbani01 marked this pull request as ready for review June 26, 2026 10:28
@mbani01 mbani01 merged commit f07fc1b into main Jun 26, 2026
14 checks passed
@mbani01 mbani01 deleted the feat/nuget-worker branch June 26, 2026 10:29

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 3036aaa. Configure here.

const delta = totalDownloads - prev
const dailyChanged = await insertDailyDownloads(qx, String(packageId), [
{ day: today, downloads: delta },
])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same-day download delta dropped

Medium Severity

When recordNuGetDownloadSnapshot sees a higher NuGet total than the stored total_downloads, it inserts the difference into downloads_daily for today via insertDailyDownloads, which uses ON CONFLICT (package_id, date) DO NOTHING. A second enrichment the same UTC day still bumps packages.total_downloads, but the extra delta for that date is skipped, so daily and 30-day download rollups stay too low.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3036aaa. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.

Comment on lines +73 to +80
export function getNuGetConfig() {
return {
batchSize: parseInt(process.env.NUGET_FETCHER_BATCH_SIZE ?? '1000', 10),
concurrency: parseInt(process.env.NUGET_FETCHER_CONCURRENCY ?? '20', 10),
groupDelayMs: parseInt(process.env.NUGET_FETCHER_GROUP_DELAY_MS ?? '0', 10),
isCritical: (process.env.NUGET_FETCHER_IS_CRITICAL ?? 'false') === 'true',
}
}
Comment on lines +80 to +83
headers: {
'Accept-Encoding': 'gzip',
},
timeout: 15000,
Comment on lines +104 to +107
const resp = await axios.get<NuGetRegistrationPage>(pageId, {
headers: { 'Accept-Encoding': 'gzip' },
timeout: 15000,
})
Comment on lines +122 to +128
const resp = await axios.get<NuGetRegistrationIndex>(
`${registrationBaseUrl}${lowerId}/index.json`,
{
headers: { 'Accept-Encoding': 'gzip' },
timeout: 15000,
},
)
Comment on lines +318 to +321
const thirtyDaysAgo = new Date(today)
thirtyDaysAgo.setDate(thirtyDaysAgo.getDate() - 29)
const startDate = thirtyDaysAgo.toISOString().split('T')[0]

Comment on lines +1 to +8
import { TEMPORAL_CONFIG, getTemporalClient } from '@crowd/temporal'

import { osvSync } from '../osv/workflows'

async function main(): Promise<void> {
const raw = process.argv[2]
const ecosystems = raw ? raw.split(',').map((e) => e.trim()) : ['cargo']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants