containers: further restrict container label names#6420
containers: further restrict container label names#6420martinezjandrew wants to merge 7 commits intomainfrom
Conversation
|
Now I have a clear picture. Let me analyze the type issue:
Wait, actually that's fine — we want to reject non-ASCII, and any byte >= 0x80 would be negative as signed char and would fail the Let me double check: for the name validation in the old code, Let me also check if there's a missing test for the DEL character (0x7F) in label values: Looking at the tests, there's no explicit test for 0x7F (DEL), which is the boundary between accepted (0x7E Let me also check: is there a test for empty label values? The code doesn't check for empty values — an empty string would pass the validation loop (no characters to check). That seems fine since an empty value for a label is a valid use case. Let me check if the error messages properly identify which label triggered the error. The messages include "at index i" which refers to the position of the label in the dict, not the position of the offending character within the name/value. That's consistent with the original code. Let me also verify the original error message style said "(index 0)" with parens and the new one says "at index 0" without parens: Looking at the diff:
The new message drops the closing parenthesis and changes the format. This is fine — the test regex patterns match accordingly. Actually, I notice the old error messages had parentheses Let me think about whether there are any real issues here:
Overall this is a clean, well-tested tightening of validation on an experimental API. I don't see actionable issues. LGTM |
…ontainer-label-validation
| constexpr bool isAlphanumericString(kj::StringPtr s) { | ||
| for (auto c: s) { | ||
| if (!((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '_')) | ||
| return false; | ||
| } | ||
| return true; | ||
| } | ||
|
|
||
| constexpr bool hasNoControlCharacters(kj::StringPtr s) { | ||
| for (auto c: s) { | ||
| if (static_cast<kj::byte>(c) < 0x20) return false; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
This is still extremely slow:
On top:
#include <cstdint>
| constexpr bool isAlphanumericString(kj::StringPtr s) { | |
| for (auto c: s) { | |
| if (!((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || (c >= '0' && c <= '9') || c == '_')) | |
| return false; | |
| } | |
| return true; | |
| } | |
| constexpr bool hasNoControlCharacters(kj::StringPtr s) { | |
| for (auto c: s) { | |
| if (static_cast<kj::byte>(c) < 0x20) return false; | |
| } | |
| return true; | |
| } | |
| static constexpr uint64_t kAlnumBits[4] = { | |
| UINT64_C(0x03FF000000000000), // '0'–'9' (bits 48–57) | |
| UINT64_C(0x07FFFFFE87FFFFFE), // 'A'–'Z', '_', 'a'–'z' | |
| UINT64_C(0), // 0x80–0xBF: none | |
| UINT64_C(0), // 0xC0–0xFF: none | |
| }; | |
| static constexpr uint64_t kNoCtrlBits[4] = { | |
| UINT64_C(0xFFFFFFFF00000000), // 0x20–0x3F valid (bits 32–63) | |
| UINT64_C(0xFFFFFFFFFFFFFFFF), // 0x40–0x7F: all valid | |
| UINT64_C(0xFFFFFFFFFFFFFFFF), // 0x80–0xBF: all valid | |
| UINT64_C(0xFFFFFFFFFFFFFFFF), // 0xC0–0xFF: all valid | |
| }; | |
| constexpr bool isAlphanumericString(kj::StringPtr s) { | |
| uint64_t acc = ~uint64_t(0); | |
| for (auto c : s) { | |
| auto uc = static_cast<unsigned char>(c); | |
| acc &= kAlnumBits[uc >> 6] >> (uc & 63); | |
| } | |
| return (acc & 1) != 0; | |
| } | |
| constexpr bool hasNoControlCharacters(kj::StringPtr s) { | |
| uint64_t acc = ~uint64_t(0); | |
| for (auto c : s) { | |
| auto uc = static_cast<unsigned char>(c); | |
| acc &= kNoCtrlBits[uc >> 6] >> (uc & 63); | |
| } | |
| return (acc & 1) != 0; | |
| } |
There was a problem hiding this comment.
That's quite a bit of added complexity in the implementation that might not be worthwhile. How hot of a path is the name validation in? Are these names large in general? If they are short and the path is not that hot, then these don't need much micro-optimization
There was a problem hiding this comment.
I appreciate the help! I applied the change in 7e847c5.
There was a problem hiding this comment.
That's quite a bit of added complexity in the implementation that might not be worthwhile. How hot of a path is the name validation in? Are these names large in general? If they are short and the path is not that hot, then these don't need much micro-optimization
I agree, this seems like overkill here, the label validation is almost certainly not going to be the bottleneck in this RPC call. Label names are capped at 16 bytes long (label values are 64 bytes max), the simpler implementation we had before is likely fine imo.
If we did want to use an implementation like this then container.c++ is probably not the right place for it but should be in a more general purpose string library.
…ontainer-label-validation
Follows up on PR #6352 (#6352), which only validated against control characters in container label names and values.
This PR further restricts label names to alphanumeric characters ([A-Za-z0-9]). Alphanumeric-only names prevent injection when labels flow into GraphQL queries.