Skip to content

containers: further restrict container label names#6420

Open
martinezjandrew wants to merge 7 commits intomainfrom
amartinez/container-label-validation
Open

containers: further restrict container label names#6420
martinezjandrew wants to merge 7 commits intomainfrom
amartinez/container-label-validation

Conversation

@martinezjandrew
Copy link
Copy Markdown
Contributor

@martinezjandrew martinezjandrew commented Mar 25, 2026

Follows up on PR #6352 (#6352), which only validated against control characters in container label names and values.

This PR further restricts label names to alphanumeric characters ([A-Za-z0-9]). Alphanumeric-only names prevent injection when labels flow into GraphQL queries.

@martinezjandrew martinezjandrew marked this pull request as ready for review March 25, 2026 21:05
@martinezjandrew martinezjandrew requested review from a team as code owners March 25, 2026 21:05
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 25, 2026

Now I have a clear picture. Let me analyze the type issue:

field.value is a kj::String, and iterating it yields char values. In C++, char may be signed (typically on x86). The comparison c >= 0x20 && c <= 0x7E on a signed char means that if c has a value >= 128 (e.g., UTF-8 multibyte), it would be negative as a signed char. Since a negative value is < 0x20, the check would correctly reject it. So the logic is actually sound — any byte with the high bit set would be negative and fail c >= 0x20.

Wait, actually that's fine — we want to reject non-ASCII, and any byte >= 0x80 would be negative as signed char and would fail the >= 0x20 check. So the behavior is correct.

Let me double check: for the name validation in the old code, static_cast<kj::byte>(c) >= 0x20 was used. The new code for names doesn't need a cast since it's checking ranges like c >= 'A'. For values, the new code uses c >= 0x20 && c <= 0x7E without the static_cast<kj::byte> — this is fine since any byte >= 0x80 is negative as signed char and correctly rejected.

Let me also check if there's a missing test for the DEL character (0x7F) in label values:

Looking at the tests, there's no explicit test for 0x7F (DEL), which is the boundary between accepted (0x7E ~) and rejected. The DEL character is a control character that would have been accepted by the old check (>= 0x20) but is now correctly rejected by the new check (<= 0x7E). A boundary test would be nice but it's not a blocking issue.

Let me also check: is there a test for empty label values? The code doesn't check for empty values — an empty string would pass the validation loop (no characters to check). That seems fine since an empty value for a label is a valid use case.

Let me check if the error messages properly identify which label triggered the error. The messages include "at index i" which refers to the position of the label in the dict, not the position of the offending character within the name/value. That's consistent with the original code. Let me also verify the original error message style said "(index 0)" with parens and the new one says "at index 0" without parens:

Looking at the diff:

  • Old: "Label names cannot contain control characters (index ", i, ")"
  • New: "Label names must contain only alphanumeric characters (A-Z, a-z, 0-9) " "at index ", i

The new message drops the closing parenthesis and changes the format. This is fine — the test regex patterns match accordingly.

Actually, I notice the old error messages had parentheses (index N) while the new ones use at index N — consistent within the PR but a style change. Not a real issue.

Let me think about whether there are any real issues here:

  1. Backward compatibility: Container API is experimental/new (2025), so tightening validation is fine without a compat flag.
  2. Security: Tightening validation is a security improvement — restricting label names to alphanumeric prevents injection into downstream systems (GraphQL queries).
  3. Correctness: The char signedness for value validation is correct — negative signed chars (>= 0x80 unsigned) will fail c >= 0x20 since they're negative.
  4. Tests: Good coverage of edge cases (hyphens, emoji, accented chars, control chars, plus positive test for valid labels).
  5. Missing edge case in tests: No test for DEL (0x7F) in values or for label value at the exact boundary (0x7E ~). But the positive test includes ! which is 0x21, and any-value_is.fine! covers several printable ASCII chars. Not a blocking issue.
  6. No test for label name length limit: There's no max length check, but that's pre-existing and not part of this PR.

Overall this is a clean, well-tested tightening of validation on an experimental API. I don't see actionable issues.

LGTM

github run

@martinezjandrew martinezjandrew requested a review from anonrig March 26, 2026 14:19
@martinezjandrew martinezjandrew changed the title containers: further restrict container label names and values containers: further restrict container label names Mar 26, 2026
Comment on lines +19 to +32
constexpr bool isAlphanumericString(kj::StringPtr s) {
for (auto c: s) {
if (!((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '_'))
return false;
}
return true;
}

constexpr bool hasNoControlCharacters(kj::StringPtr s) {
for (auto c: s) {
if (static_cast<kj::byte>(c) < 0x20) return false;
}
return true;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still extremely slow:

On top:

#include <cstdint> 
Suggested change
constexpr bool isAlphanumericString(kj::StringPtr s) {
for (auto c: s) {
if (!((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '_'))
return false;
}
return true;
}
constexpr bool hasNoControlCharacters(kj::StringPtr s) {
for (auto c: s) {
if (static_cast<kj::byte>(c) < 0x20) return false;
}
return true;
}
static constexpr uint64_t kAlnumBits[4] = {
UINT64_C(0x03FF000000000000), // '0'–'9' (bits 48–57)
UINT64_C(0x07FFFFFE87FFFFFE), // 'A'–'Z', '_', 'a'–'z'
UINT64_C(0), // 0x80–0xBF: none
UINT64_C(0), // 0xC0–0xFF: none
};
static constexpr uint64_t kNoCtrlBits[4] = {
UINT64_C(0xFFFFFFFF00000000), // 0x20–0x3F valid (bits 32–63)
UINT64_C(0xFFFFFFFFFFFFFFFF), // 0x40–0x7F: all valid
UINT64_C(0xFFFFFFFFFFFFFFFF), // 0x80–0xBF: all valid
UINT64_C(0xFFFFFFFFFFFFFFFF), // 0xC0–0xFF: all valid
};
constexpr bool isAlphanumericString(kj::StringPtr s) {
uint64_t acc = ~uint64_t(0);
for (auto c : s) {
auto uc = static_cast<unsigned char>(c);
acc &= kAlnumBits[uc >> 6] >> (uc & 63);
}
return (acc & 1) != 0;
}
constexpr bool hasNoControlCharacters(kj::StringPtr s) {
uint64_t acc = ~uint64_t(0);
for (auto c : s) {
auto uc = static_cast<unsigned char>(c);
acc &= kNoCtrlBits[uc >> 6] >> (uc & 63);
}
return (acc & 1) != 0;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a bit of added complexity in the implementation that might not be worthwhile. How hot of a path is the name validation in? Are these names large in general? If they are short and the path is not that hot, then these don't need much micro-optimization

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the help! I applied the change in 7e847c5.

Copy link
Copy Markdown
Member

@gpanders gpanders Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a bit of added complexity in the implementation that might not be worthwhile. How hot of a path is the name validation in? Are these names large in general? If they are short and the path is not that hot, then these don't need much micro-optimization

I agree, this seems like overkill here, the label validation is almost certainly not going to be the bottleneck in this RPC call. Label names are capped at 16 bytes long (label values are 64 bytes max), the simpler implementation we had before is likely fine imo.

If we did want to use an implementation like this then container.c++ is probably not the right place for it but should be in a more general purpose string library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants