Skip to content

Backport 2206 to release/0.4.z#2297

Open
jcrossley3 wants to merge 6 commits intoguacsec:release/0.4.zfrom
jcrossley3:backport-2206-to-release/0.4.z
Open

Backport 2206 to release/0.4.z#2297
jcrossley3 wants to merge 6 commits intoguacsec:release/0.4.zfrom
jcrossley3:backport-2206-to-release/0.4.z

Conversation

@jcrossley3
Copy link
Copy Markdown
Contributor

@jcrossley3 jcrossley3 commented Mar 18, 2026

To address https://redhat.atlassian.net/browse/TC-3091

Summary by Sourcery

Extend SBOM ingestion to handle additional CycloneDX component types and centralize component creation and validation while adjusting reference handling.

New Features:

  • Support ingestion of CycloneDX components representing files, machine learning models, and cryptographic assets as first-class nodes.
  • Introduce a typed ComponentType enum for CycloneDX component classification with JSON/serde integration.

Bug Fixes:

  • Ensure package and SPDX SBOM creation passes references as iterators of references, fixing ownership and ID handling for CPE and PURL references.

Enhancements:

  • Refactor CycloneDX component creation into a self-contained ComponentCreator that manages components, relationships, post-processing, validation, and database writes.
  • Include new SBOM node sources (files, machine learning models, cryptographic assets) in relationship validation and creation order to avoid deadlocks and invalid references.
  • Replace noisy stdout debug output in a vulnerability test with structured logging output and correct a misnamed regression test expectation.

Build:

  • Update rustfmt configuration to use the Rust 2024 edition in addition to the 2024 style edition.

Tests:

  • Adjust analysis endpoint regression test expectations for component graph results and add tests for the new ComponentType enum conversions.

Specifically files, machine learning models, and cryptographic
assets. We've introduced place holders for the latter 2 until
migrations for new tables are added in a subsequent commit.

This indirectly fixes guacsec#2205 because it prevents cryptographic assets
from being displayed as packages for CBOM's, but there's not yet a
service API that would allow you to fetch those assets. That'll come
in a separate commit.

(cherry picked from commit 06364ad)
(cherry picked from commit cb0107b)
Also made the use of the `refs` field a bit more clear and reduced
some cloning.

(cherry picked from commit 6c84171)
Also a bit of renaming and commentary

(cherry picked from commit 948f9aa)
Weirdly, I had to revert the tc2758 test expectation. Still not sure
why.

(cherry picked from commit 44fab35)
…btypes"

This reverts commit 470cd79.

(cherry picked from commit c3dc0fe)
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 18, 2026

Reviewer's Guide

Backports extended CycloneDX SBOM ingestion support for additional component types (files, machine-learning models, cryptographic assets), refactors component creation into an owning ComponentCreator that manages references per-node, updates package reference handling APIs, and adjusts tests and formatting for the 0.4.z branch.

Sequence diagram for CycloneDX SBOM ingestion with ComponentCreator

sequenceDiagram
    participant Caller
    participant Creator as CycloneDxCreator
    participant ComponentCreator
    participant Processors as ProcessorList
    participant DB as Database

    Caller->>Creator: create_components(db, processors)
    Creator->>ComponentCreator: new(sbom_id, components.len)

    loop for each component
        Creator->>ComponentCreator: add_component(component)
        alt component has cpe/purl
            ComponentCreator->>ComponentCreator: add_cpe/add_purl(node_id, parsed)
        end
        alt component has pedigree ancestors
            loop for each ancestor
                ComponentCreator->>ComponentCreator: add_component(ancestor)
                ComponentCreator->>ComponentCreator: add_relation(ancestor_id, AncestorOf, node_id)
            end
        end
        alt component has variants
            loop for each variant
                ComponentCreator->>ComponentCreator: add_component(variant)
                ComponentCreator->>ComponentCreator: add_relation(node_id, Variant, variant_id)
            end
        end
    end

    loop for each relation tuple
        Creator->>ComponentCreator: add_relation(left, rel, right)
    end

    Creator->>ComponentCreator: post_process(processors)
    ComponentCreator->>Processors: PostContext.run(processors)

    Creator->>ComponentCreator: validate()
    ComponentCreator->>ComponentCreator: build References from packages, files, models, crypto
    ComponentCreator->>ComponentCreator: relationships.validate(sources)

    Creator->>ComponentCreator: create(db)
    ComponentCreator->>DB: licenses.create
    ComponentCreator->>DB: purls.create
    ComponentCreator->>DB: cpes.create
    ComponentCreator->>DB: packages.create
    ComponentCreator->>DB: files.create
    ComponentCreator->>DB: models.create
    ComponentCreator->>DB: crypto.create
    ComponentCreator->>DB: relationships.create

    ComponentCreator-->>Creator: Result
    Creator-->>Caller: Result
Loading

Class diagram for updated SBOM component ingestion (CycloneDX)

classDiagram
    class Creator {
        +sbom_id Uuid
        +components Vec~Component~
        +relations Vec~(String, Relationship, String)~
        +create_components(db, processors) Result
    }

    class ComponentCreator {
        -cpes CpeCreator
        -purls PurlCreator
        -licenses LicenseCreator
        -packages PackageCreator
        -files FileCreator
        -models MachineLearningModelCreator
        -crypto CryptographicAssetCreator
        -relationships RelationshipCreator~CycloneDxProcessor~
        -refs HashMap~String, Vec~PackageReference~~
        +new(sbom_id, capacity) ComponentCreator
        +add_component(comp) Result
        +add_relation(left, rel, right) void
        +post_process(processors) void
        +validate() Result
        +create(db) Result
        -add_cpe(node_id, cpe) void
        -add_purl(node_id, purl) void
        -add_license(node_id, license) Uuid
    }

    class PackageCreator {
        -sbom_id Uuid
        +new(sbom_id) PackageCreator
        +with_capacity(sbom_id, capacity) PackageCreator
        +add(node_info, refs_iter, checksums_iter) void
    }

    class FileCreator {
        +new(sbom_id) FileCreator
        +with_capacity(sbom_id, capacity) FileCreator
        +add(node_id, name, checksums_iter) void
        +create(db) Result
    }

    class MachineLearningModelCreator {
        -nodes NodeCreator
        +new(sbom_id) MachineLearningModelCreator
        +with_capacity(sbom_id, capacity) MachineLearningModelCreator
        +add(node_id, name, checksums_iter) void
        +create(db) Result
    }

    class CryptographicAssetCreator {
        -nodes NodeCreator
        +new(sbom_id) CryptographicAssetCreator
        +with_capacity(sbom_id, capacity) CryptographicAssetCreator
        +add(node_id, name, checksums_iter) void
        +create(db) Result
    }

    class NodeCreator {
        +new(sbom_id) NodeCreator
        +with_capacity(sbom_id, capacity) NodeCreator
        +add(node_id, name, checksums_iter) void
        +create(db) Result
        +references() Iterator~&str~
    }

    class PackageReference {
        <<enumeration>>
        Cpe(Uuid)
        Purl(Purl)
    }

    class ComponentType {
        <<enumeration>>
        Application
        Framework
        Library
        Container
        Platform
        OperatingSystem
        Device
        DeviceDriver
        Firmware
        File
        MachineLearningModel
        Data
        CryptographicAsset
    }

    class RelationshipCreator_CycloneDxProcessor_ {
        +new(sbom_id, CycloneDxProcessor) RelationshipCreator_CycloneDxProcessor_
        +relate(left, rel, right) void
        +rels Vec~Relationship~
        +externals Vec~Relationship~
        +validate(sources) Result
        +create(db) Result
    }

    class PostContext {
        +cpes CpeCreator*
        +purls PurlCreator*
        +packages PackageCreator*
        +relationships Vec~Relationship~*
        +externals Vec~Relationship~*
        +run(processors) void
    }

    Creator --> ComponentCreator : uses
    ComponentCreator --> PackageCreator : owns
    ComponentCreator --> FileCreator : owns
    ComponentCreator --> MachineLearningModelCreator : owns
    ComponentCreator --> CryptographicAssetCreator : owns
    ComponentCreator --> RelationshipCreator_CycloneDxProcessor_ : owns
    ComponentCreator --> ComponentType : classifies
    ComponentCreator --> PackageReference : maps node to

    MachineLearningModelCreator --> NodeCreator : wraps
    CryptographicAssetCreator --> NodeCreator : wraps

    PackageCreator ..> PackageReference : uses

    PostContext ..> CpeCreator
    PostContext ..> PurlCreator
    PostContext ..> PackageCreator
    PostContext ..> RelationshipCreator_CycloneDxProcessor_
Loading

File-Level Changes

Change Details Files
Refactor CycloneDX SBOM component processing into an owning ComponentCreator that supports new component types and centralizes validation and DB persistence.
  • Replace separate CPE/PURL/license/package/relationship creators in Creator::create with a single ComponentCreator responsible for collecting components, relations, post-processing, validation, and database writes.
  • Change ComponentCreator to own CpeCreator, PurlCreator, LicenseCreator, PackageCreator, FileCreator, MachineLearningModelCreator, CryptographicAssetCreator, and RelationshipCreator, and initialize them from sbom_id and capacity.
  • Introduce a HashMap from node_id to Vec so that CPE/PURL references are tracked per node rather than in a single flat vector.
  • Modify add_component to return Result and to branch on ComponentType (Application/Framework/Library/Container/OperatingSystem treated as packages; File, MachineLearningModel, CryptographicAsset delegated to their respective creators; unsupported types logged; invalid type reported as InvalidContent).
  • Replace recursive ComponentCreator construction when handling ancestors and variants with re-using the same ComponentCreator and using add_component plus add_relation.
  • Add ComponentCreator::post_process, ::validate, and async ::create to encapsulate processor running, relationship validation across all node sources, and ordered DB insertion including new node types.
modules/ingestor/src/graph/sbom/cyclonedx.rs
Extend common SBOM model with new cryptographic asset and machine learning model creators backed by NodeCreator and integrate them into relationship validation and persistence.
  • Add CryptographicAssetCreator and MachineLearningModelCreator wrappers around NodeCreator, each providing constructors, add, create, and implementing ReferenceSource to expose node references.
  • Export the new creators from the common SBOM module and include them as sources in relationship validation so they participate in reference checks.
  • Ensure DB creation order includes files, models, and cryptographic assets after packages but before relationships to maintain consistent insertion ordering.
modules/ingestor/src/graph/sbom/common/cryptographic_asset.rs
modules/ingestor/src/graph/sbom/common/machine_learning_model.rs
modules/ingestor/src/graph/sbom/common/mod.rs
modules/ingestor/src/graph/sbom/cyclonedx.rs
Introduce ComponentType enum matching CycloneDX 1.6 component types and use it to drive routing of components into appropriate creators.
  • Define ComponentType enum with serde and strum traits for kebab-case string conversion and case-insensitive parsing of known component types (application, framework, library, container, platform, operating-system, device, device-driver, firmware, file, machine-learning-model, data, cryptographic-asset).
  • Use ComponentType::from_str on CycloneDX Component.type_ to decide whether to create package, file, ML model, or cryptographic asset entries, or to log unsupported types and return an InvalidContent error on parse failure.
  • Add unit tests verifying round-trip serialization, display, JSON representation, case-insensitive parsing, and error handling for ComponentType.
modules/ingestor/src/graph/sbom/cyclonedx.rs
Tighten package reference handling so PackageCreator::add works with iterators of references, allowing shared reference ownership and fixing type mismatches.
  • Change PackageCreator::add to take refs as an Iterator<Item=&PackageReference> instead of owning IntoIterator, and adjust internal handling to dereference CPE IDs when building cpe_refs.
  • Update callers in CycloneDX and SPDX ingestion paths, and in SbomContext::add_component, to build Vec where needed and pass refs.iter() into PackageCreator::add.
modules/ingestor/src/graph/sbom/common/package.rs
modules/ingestor/src/graph/sbom/cyclonedx.rs
modules/ingestor/src/graph/sbom/spdx.rs
modules/ingestor/src/graph/sbom/mod.rs
Adjust tests and logging to align with new behavior and configuration.
  • Fix test name typo from test_tc2578 to test_tc2758 and update expected descendant node_id/name in latest_filters test to reflect file-based PURL instead of original Maven package PURL.
  • Replace a println! debug statement in a vulnerability test with log::debug! to integrate with structured logging.
  • Set rustfmt edition to 2024 in rustfmt.toml to match style_edition and ensure consistent formatting.
modules/analysis/src/endpoints/tests/latest_filters.rs
modules/fundamental/tests/vuln/mod.rs
rustfmt.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In ComponentCreator::add_component, the const EMPTY: Vec<PackageReference> = vec![]; pattern won’t work as vec! isn’t const; consider using something like self.refs.get(&node_id).into_iter().flatten() instead of an EMPTY fallback to avoid the need for a const Vec.
  • The new ComponentType::from_str(&comp.type_) handling returns an error for unknown or unexpected component types, which changes ingestion behavior; consider whether it’s preferable to log and skip unsupported types (or fall back to a default) rather than failing the entire SBOM on an unrecognized type.
  • In MachineLearningModelCreator::add and CryptographicAssetCreator::add, you can drop the node_id.clone() when calling self.nodes.add since node_id is already owned and immediately moved, avoiding an unnecessary clone.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `ComponentCreator::add_component`, the `const EMPTY: Vec<PackageReference> = vec![];` pattern won’t work as `vec!` isn’t const; consider using something like `self.refs.get(&node_id).into_iter().flatten()` instead of an `EMPTY` fallback to avoid the need for a const Vec.
- The new `ComponentType::from_str(&comp.type_)` handling returns an error for unknown or unexpected component types, which changes ingestion behavior; consider whether it’s preferable to log and skip unsupported types (or fall back to a default) rather than failing the entire SBOM on an unrecognized type.
- In `MachineLearningModelCreator::add` and `CryptographicAssetCreator::add`, you can drop the `node_id.clone()` when calling `self.nodes.add` since `node_id` is already owned and immediately moved, avoiding an unnecessary clone.

## Individual Comments

### Comment 1
<location path="modules/ingestor/src/graph/sbom/cyclonedx.rs" line_range="415" />
<code_context>
+                match ty {
+                    // We treat all these types as "packages"
+                    Application | Framework | Library | Container | OperatingSystem => {
+                        const EMPTY: Vec<PackageReference> = vec![];
+                        self.packages.add(
+                            NodeInfoParam {
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `vec![]` in a `const` context will not compile; consider a different way to represent an empty reference set.

`vec!` cannot be used in a `const` because it’s not `const fn`, so this definition won’t compile. You can drop the `const` and fall back with something like `self.refs.get(&node_id).into_iter().flatten()` or `self.refs.get(&node_id).map(|v| v.iter()).unwrap_or_else(|| [].iter())`. Alternatively, define a static empty slice (`static EMPTY: [PackageReference; 0] = [];`) and use `EMPTY.iter()` as the fallback.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

match ty {
// We treat all these types as "packages"
Application | Framework | Library | Container | OperatingSystem => {
const EMPTY: Vec<PackageReference> = vec![];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Using vec![] in a const context will not compile; consider a different way to represent an empty reference set.

vec! cannot be used in a const because it’s not const fn, so this definition won’t compile. You can drop the const and fall back with something like self.refs.get(&node_id).into_iter().flatten() or self.refs.get(&node_id).map(|v| v.iter()).unwrap_or_else(|| [].iter()). Alternatively, define a static empty slice (static EMPTY: [PackageReference; 0] = [];) and use EMPTY.iter() as the fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant