feat: apply PyPI heuristics to project-URL classification (#800)#1066
Draft
arun2dot0 wants to merge 1 commit into
Draft
feat: apply PyPI heuristics to project-URL classification (#800)#1066arun2dot0 wants to merge 1 commit into
arun2dot0 wants to merge 1 commit into
Conversation
Move URL-label -> external-reference-type mapping into a dedicated data-only module and add PyPI-style label prefix matching. Adds an optional `url` argument to url_label_to_ert (unused for now) for the upcoming host-based classification. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Arun Selvamani <arunselvamani@gmail.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | ✅ 0 (≤ 20 complexity) |
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This is a draft / work-in-progress opened early (as requested in the issue) to get feedback on the approach before completing it.
This extends how project URLs are classified into CycloneDX external-reference types, adopting PyPI's documented heuristics (https://docs.pypi.org/project_metadata/#icons). Today classification uses only the URL label against an exact-match dict in
cyclonedx_py/_internal/utils/cdx.py. This PR moves toward classifying by both label (exact + prefix) and URL host (domain/subdomain), so emitted external references follow the de-facto standard.Design (agreed in the issue thread)
*semantics) → host suffix → host subdomain-prefix →OTHER. The label is the author's explicit intent; the host fills gaps. E.g. aFundinglabel on agithub.comURL staysOTHER, it does not becomeVCS.cyclonedx_py/_internal/utils/url_classifiers.py(four declarative tables). The matcher incdx.pyis pure logic and never changes when rules are added — extending classification is a one-line data edit.OTHER(no CycloneDX funding type;OTHERis more honest than forcingWEBSITE).CHAT; Reddit/YouTube/Twitter-X/Mastodon/Bluesky →SOCIAL.BUILD_SYSTEM.google.comleft unmapped (too ambiguous) → falls through toOTHER.Status / checklist
url_label_to_ert(label, url=None)gains an optionalurlarg (unused until Task 2, keeps back-compat). New table-driven unit tests;flake8/isort/mypyclean.poetry.py,pep621.py,packaging.py) and reconcile snapshots.Feedback on the mapping table and the
OTHER-for-funding choice is especially welcome before I finish Tasks 2–3.Resolves or fixes issue: #800
AI Tool Disclosure
Claude CodeClaude Opus 4.8Find a good-first-issue, then design and implement #800: apply PyPI's project-URL classification heuristics (label + host based) to CycloneDX external-reference type detection. Constraints I directed: full port of PyPI heuristics mapped to the nearest CycloneDX type; label-first precedence; keep the mapping expandable and kept in a data module separate from the matching logic. TDD with table-driven unit tests.Affirmation