Skip to content

add Claude skill to review code for strong type safety (RFD 643)#10125

Open
davepacheco wants to merge 4 commits intomainfrom
dap/type-safety-review
Open

add Claude skill to review code for strong type safety (RFD 643)#10125
davepacheco wants to merge 4 commits intomainfrom
dap/type-safety-review

Conversation

@davepacheco
Copy link
Collaborator

This skill enables Claude to review either existing code or a specific PR for the code's use of types to enable strong static type checking. It covers the patterns described in RFD 643 using examples from some of @jgallagher's recent PRs that clean up these kinds of problems.

This is my first real Claude skill and I'd welcome feedback. It's also Claude-written. I know from experience that can result in work that's deceptively time-consuming to review. Do feel free to stop and say "go back to the drawing board because this needs more serious thought before we review it more carefully".

There's one thing I'm struggling to resolve here, which is that ideally, the "source of truth" for the anti-patterns that this looks for would be the same as the one we refer people to: RFD 643. But RFD 643 includes a lot more context about each item, which I think is helpful for humans but maybe less useful for the LLM.

@davepacheco
Copy link
Collaborator Author

I'm working on pasting example output here but spending a while on tweaks to the skill to get the output to be pasteable Markdown.

@davepacheco
Copy link
Collaborator Author

Below is example output for the prompt "Review the code in @nexus/types/src/deployment.rs for use of strong types." I've run this several times for testing and of course it produces different output. Below is probably one of the weaker ones. Interestingly, other runs have reported the use of <invalid> as something warranting a constant (which I'd agree with) but this one explicitly calls it out as a non-problem. It's small and subjective so I'm not sure how significant that is.

Broadly, I feel like the suggestions here have been useful.


Type Safety Review

Mode

Targeted review of nexus/types/src/deployment.rs


Findings

SUGGESTION: Multiple representations — deployment.rs:259, 1295

Problem: Blueprint::cockroachdb_fingerprint is typed as String and uses the
empty string as a sentinel for "no fingerprint." The display code at line 1295
explicitly checks is_empty() and substitutes "(none)". This same field appears
in BlueprintMetadata at line 3237 with the same representation, while the closely
related cockroachdb_setting_preserve_downgrade does use
Option<CockroachDbPreserveDowngrade> for the same "optional or missing"
semantics. The inconsistency forces every consumer to know about the empty-string
sentinel.

Fix: Use Option<String> (or a newtype CockroachDbFingerprint(String)
wrapped in Option) so the absence of a fingerprint is explicit in the type:

pub cockroachdb_fingerprint: Option<String>,

The display code can then use .unwrap_or("(none)") instead of checking
is_empty().


SUGGESTION: Multiple representations — deployment.rs:284–288, 1329–1338

Problem: Blueprint::comment is a String where the empty string means "no
comment." The display code at line 1329 checks is_empty() and substitutes
NONE_PARENS. This is the same sentinel pattern as cockroachdb_fingerprint, and
it means every consumer must know the convention. The creator field (line 284)
has the inverse problem: it is documented as "would generally be the Uuid of a
Nexus instance" but is typed as bare String, so the compiler cannot distinguish a
blueprint created by a Nexus instance (where creator is a parseable UUID) from
one created by RSS or a test harness.

Fix: For comment, use Option<String>:

pub comment: Option<String>,                                                       

For creator, consider a small enum or a newtype if the set of possible creator
identities is closed:

pub enum BlueprintCreator {                                                        
    Nexus(OmicronZoneUuid),     
    Rss,
    Test,
}

If the set of creators is genuinely open, String is acceptable but the
empty-string sentinel for comment should still be replaced with Option.


SUGGESTION: Missing full-struct destructuring — deployment.rs:1601

Problem: BlueprintSledConfig::into_in_service_sled_config() converts
BlueprintSledConfig to OmicronSledConfig by accessing each field via
self.field_name without destructuring. BlueprintSledConfig currently has eight
fields; the conversion drops no field silently (each is either mapped or
intentionally filtered). However, if a new field is added to BlueprintSledConfig
in the future (e.g., a new config block to pass to sled-agent), the compiler will
not flag this conversion site, and the new field will be silently omitted from the
OmicronSledConfig. Compare this to From<BlueprintZoneConfig> for OmicronZoneConfig (line 1806), which does use a full destructuring and correctly
marks disposition: _disposition as intentionally dropped. The inconsistency is a
future-maintainability trap.

Fix: Destructure at the top of the method:

pub fn into_in_service_sled_config(self) -> OmicronSledConfig {                    
    // This destructuring exists to cause a compile error if fields are            
    // added to BlueprintSledConfig; update the conversion below if needed.        
    let BlueprintSledConfig {                                                      
        state: _,                                                                  
        subnet: _,                                                                 
        last_allocated_ip_subnet_offset: _,
        sled_agent_generation,                                                     
        disks,
        datasets,                                                                  
        zones,                  
        remove_mupdate_override,
        host_phase_2,                                                              
        measurements,
    } = self;                                                                      
    OmicronSledConfig { generation: sled_agent_generation, ... }
}                                                                                  

SUGGESTION: Missing full-struct destructuring — deployment.rs:3182

Problem: From<BlueprintDatasetConfig> for DatasetConfig constructs a
DatasetConfig by accessing fields via config.field_name.
BlueprintDatasetConfig has eight fields; the conversion uses id, pool,
kind, quota, reservation, and compression, which means disposition and
address are silently dropped. Dropping disposition is clearly intentional
(sled-agent only sees in-service datasets). Dropping address may also be
intentional (the address lives elsewhere in the sled config), but there is no
compiler-enforced guard and no comment explaining the omission. A future field
added to BlueprintDatasetConfig would be silently dropped.

Fix:

impl From<BlueprintDatasetConfig> for DatasetConfig {
    fn from(config: BlueprintDatasetConfig) -> Self {
        // This destructuring is here so that new fields in                        
        // BlueprintDatasetConfig cause a compile error here. Update               
        // accordingly. `disposition` and `address` are intentionally              
        // not passed to sled-agent.                                               
        let BlueprintDatasetConfig {
            disposition: _,                                                        
            id,                                                                    
            pool,
            kind,                                                                  
            address: _,         
            quota,
            reservation,
            compression,
        } = config;
        Self {
            id,
            name: DatasetName::new(pool, kind),
            inner: SharedDatasetConfig { quota, reservation, compression },        
        }
    }                                                                              
}                               

SUGGESTION: Missing full-struct destructuring — deployment.rs:3201

Problem: BlueprintDatasetConfig::as_strings() accesses struct fields via
self.field_name to build display strings. If a new field is added to
BlueprintDatasetConfig, the display method will not be flagged by the compiler,
and the field will be silently absent from the table output. The display code is
not safety-critical, but the omission could cause confusing operator displays.

Fix: Add a destructuring at the top:

fn as_strings(&self) -> Vec<String> {
    let BlueprintDatasetConfig {                                                   
        disposition, id, pool, kind, address: _, quota, reservation, compression,
    } = self;                                                                      
    vec![                       
        DatasetName::new(*pool, kind.clone()).full_name(),                         
        id.to_string(),
        disposition.to_string(),                                                   
        ...                     
    ]                                                                              
}                               

SUGGESTION: Missing full-struct destructuring — deployment.rs:2562

Problem: PendingMgsUpdate::to_bp_table_values() accesses struct fields via
self.field_name to build display output. If a new field is added to
PendingMgsUpdate, the display will silently omit it.

Fix: Destructure self at the top of the method body.


SUGGESTION: Weak enum/bool usage — deployment.rs:3265

Problem: BlueprintTarget::enabled: bool controls whether the system actively
executes toward this blueprint. At call sites, code like BlueprintTarget { enabled: true, ... } requires the reader to look up the meaning of true. A
two-variant enum would make the intent self-documenting:

pub enum BlueprintTargetExecution {
    Enabled,
    Disabled,
}

This is a publicly-exported API type, so clarity matters. BlueprintTargetSet
(line 3272) has the same enabled: bool.

Fix:

pub struct BlueprintTarget {                                                       
    pub target_id: BlueprintUuid,
    pub execution: BlueprintTargetExecution,
    pub time_made_target: chrono::DateTime<chrono::Utc>,                           
}

SUGGESTION: Missing newtype for domain value — deployment.rs:2499

Problem: PendingMgsUpdate::slot_id: u16 represents the "last known MGS slot
(cubby number)" of a baseboard. This is a raw u16 with no type to distinguish it
from other u16 values (e.g., a port number, an index, a generation counter). The
struct PendingMgsUpdate contains both slot_id: u16 and artifact_version
fields; a raw u16 is easy to misuse when constructing or matching this struct.

Fix:

pub struct MgsSlotId(pub u16);

pub struct PendingMgsUpdate {
    ...
    pub slot_id: MgsSlotId,                                                        
    ...
}                                                                                  

SUGGESTION: Multiple representations — deployment.rs:3280–3298

Problem: CollectionDatasetIdentifier::id is typed as Option<DatasetUuid>
even though the inline TODO comment (line 3278) says "IDs are not optional" and was
"kept for backwards compatibility." This means every consumer of
CollectionDatasetIdentifier must handle the None case that will never occur in
practice for real data, and the type actively misrepresents the domain invariant.
The type is Clone + Ord and used for diffing datasets in blueprint display.

Fix: Remove Option and use DatasetUuid directly:

pub struct CollectionDatasetIdentifier {                                           
    name: String,               
    id: DatasetUuid,
}                                                                                  

If backwards compatibility with old serialized data is the concern, this can be
handled in the Deserialize impl with a serde helper that rejects None.


No issues found in:

  • Category "Stringly-typed values": No match branches on arbitrary strings for
    domain concepts. ExpectedVersion::FromStr uses "invalid" as a sentinel, but
    Display/FromStr are consistent with each other and the type is intentionally
    self-contained.
  • Category "Implicit runtime panics": No production .unwrap() calls without
    context; the unwrap() calls visible in the file are all in #[cfg(test)] code.
  • Category "Magic literals / missing constants": No repeated numeric or string
    literals encoding a shared invariant that could drift.

Summary

0 blocking issues, 9 suggestions. The most impactful are the three missing
full-struct destructurings in conversion impls (into_in_service_sled_config,
From<BlueprintDatasetConfig>, From<BlueprintPhysicalDiskConfig>), where new
fields added to blueprint types would be silently dropped from the sled-agent
representation. The empty-string sentinels for cockroachdb_fingerprint and
comment are a secondary concern. The rest are robustness and readability
improvements.

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to the more Claude-aware folks for the wording and structure of this; the examples LGTM (with maybe one exception, noted below).

}
```

### Example: Inline enum variant data vs. named struct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we could use this one more, but I don't think this pattern works for any types that end up in OpenAPI specs, right? I don't remember what the exact error is off the top of my head, but I always end up having to switch back to Variant { fields... }.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I'd rather run into that and then update the guidance than remove it prematurely, since it's still useful outside of OpenAPI documents and I'm guessing can be made to work with those, too.

@davepacheco
Copy link
Collaborator Author

I added a note about Vecs vs sets and asked it to review the file again with "Review the code in @nexus/types/src/deployment.rs for type safety. Right now, I'm only interested in the category related to Vec vs. BTreeSet." Result is below.


Type Safety Review

Mode

Targeted review of nexus/types/src/deployment.rs — category: Vec when
uniqueness matters


Findings

BLOCKING Vec when uniqueness matters — deployment.rs:3313

UnstableReconfiguratorState::blueprints: Vec<Blueprint>

Problem: Blueprint has an id: BlueprintUuid field (line 214). A
Vec<Blueprint> permits duplicate blueprint IDs. The consumer in
nexus/reconfigurator/simulation/src/system.rs:1051-1064 iterates this Vec and
calls add_blueprint_inner for each entry, silently dropping any duplicate (with a
warning). This means a round-trip through UnstableReconfiguratorState can
silently lose blueprints with duplicate IDs — the caller has no way to know at the
type level that uniqueness is required or that duplicates were dropped.

Fix: Implement IdOrdItem for Blueprint and change the field type to
IdOrdMap<Blueprint>:

impl IdOrdItem for Blueprint {
    type Key<'a> = BlueprintUuid;                                                  
    fn key(&self) -> BlueprintUuid { self.id }
}                                                                                  
                                
pub blueprints: IdOrdMap<Blueprint>,                                               

The consumer in load_serialized_inner can then iterate with .into_values() or
similar; the duplicate-handling warning can be removed since the type now prevents
duplicates at construction time.


BLOCKING Vec when uniqueness matters — deployment.rs:3311

UnstableReconfiguratorState::collections: Vec<Collection>

Problem: Collection has id: CollectionUuid (inventory.rs:83). Same pattern
as blueprints above: the consumer in system.rs:1031-1044 silently drops
duplicate collection IDs with a warning on load. A Vec<Collection> provides no
compile-time guarantee of uniqueness.

Fix: Implement IdOrdItem for Collection (in inventory.rs) and change the
field type:

impl IdOrdItem for Collection {
    type Key<'a> = CollectionUuid;
    fn key(&self) -> CollectionUuid { self.id }                                    
}
                                                                                   
pub collections: IdOrdMap<Collection>,

SUGGESTION Vec when uniqueness matters — deployment.rs:3316

UnstableReconfiguratorState::silo_names: Vec<Name>

Problem: Silo names are unique identifiers — no system should have two silos
with the same name. The Vec<Name> type permits duplicates. When this Vec is
loaded by the simulator (config.rs:390), it calls set_silo_names_inner which
does names.into_iter().collect() into an IndexSet<Name>, silently
deduplicating. The type mismatch between the serialized form (Vec) and the
in-memory form (IndexSet) means duplicates are present in the file but invisible
after loading. Note also that SimConfig already stores silo_names as
IndexSet<Name> (config.rs:26), confirming the uniqueness intent.

Fix: Use BTreeSet<Name> (for stable serialization order and uniqueness
enforcement):

pub silo_names: BTreeSet<omicron_common::api::external::Name>,

SUGGESTION Vec when uniqueness matters — deployment.rs:3317

UnstableReconfiguratorState::external_dns_zone_names: Vec<String>

Problem: DNS zone names in the database are unique. A Vec<String> does not
enforce this. The consumer in simulation/src/config.rs:373-384 explicitly warns
and silently discards all but the first entry when more than one is present — which
is a sign the type is too permissive. (The reconfigurator execution path in
execution/src/dns.rs:84-87 even asserts len() == 1 at runtime for its own use
of the same data source.) The Vec makes it look like multiple zone names are a
supported case, but every consumer either ignores extras or hard-asserts there is
exactly one.

Fix: Use BTreeSet<String> to at minimum enforce uniqueness. If the current
"exactly one zone name" invariant is expected to remain stable, a plain String
field would be even clearer. If future multi-zone support is intentional,
BTreeSet<String> correctly expresses the intent while removing the silent-discard
behavior:

pub external_dns_zone_names: BTreeSet<String>,

No issues found in:

  • No other Vec fields exist in deployment.rs structs. Local Vec variables in
    display/table methods are temporary sorted collections where order matters and
    uniqueness is guaranteed by the source IdOrdMap — no issues there.

Summary

2 blocking issues (blueprints, collections — silent data loss on duplicate IDs
during round-trip), 2 suggestions (silo_names, external_dns_zone_names
uniqueness not enforced at the type boundary).

@david-crespo
Copy link
Contributor

david-crespo commented Mar 24, 2026

This looks pretty good to me at a glance — will try it and see what it does.

I like moving the references out to another file, though I wonder if it should be clear that it should always look at them. If not, it could use a little more guidance about when to look at them — but I suspect that if you tried to do that you'd find the answer is "always." The skill is about 3k tokens and the reference is slightly more, which is not a big deal, especially if this is skill is run in a subagent, which wouldn't clutter the top-level conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants