objstorage: add encrypting writable and decrypting readable#5781
objstorage: add encrypting writable and decrypting readable#5781andrew-r-thomas wants to merge 1 commit intocockroachdb:masterfrom
Conversation
| return nil | ||
| } | ||
|
|
||
| func AppearsEncrypted(text []byte) bool { |
There was a problem hiding this comment.
Does this one want a name that clarifies it is the external file encryption scheme?
| ) | ||
|
|
||
| const ( | ||
| encryptionVersion = 2 |
There was a problem hiding this comment.
should this be "externalEncryptionVersion" / preamble / nonce /etc to be clearer that they don't apply to pebble's internal encryption-at-rest support? or is being in the objstorageprovider pkg enough?
Maybe let @RaduBerinde weigh in
443ee72 to
b28fef4
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
b28fef4 to
4d19529
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
4d19529 to
4a1bf8f
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
4a1bf8f to
b8f3f08
Compare
dt
left a comment
There was a problem hiding this comment.
LGTM / it's just a mechanical mv from the CRDB repo, but maybe wait for @RaduBerinde to confirm as well, mostly just that this package the the right place for these to land
RaduBerinde
left a comment
There was a problem hiding this comment.
Is there a big advantage to having this code in Pebble (as opposed to doing it on the cockroach side, inside remote.Storage.ReadObject?) It would be nice if it all lived in a single place.
I realize we'll need to store the key in the manifest but that could be implemented in a generic way (we'd add an ObjMeta []byte to ExternalFile and pass it back to remote.Storage.ReadObject).
@RaduBerinde made 1 comment.
Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions (waiting on andrew-r-thomas, annrpom, and sumeerbhola).
I think that's the intention here, isn't it? i.e. that pebble is the "one place" for all the code required to write or read a backup sstable, with CRDB using that pebble code to write these SSTs during a backup and then pebble using it to read them after they're linked into the LSM?
Hm. I'm not sure I see the motivation here? Like, right now this is a pretty well defined API: you specify the object name you want to read or write and get to read or write bytes to/from it. The meaning you ascribe to those bytes is up to you: you can store compressed bytes or uncompressed as you choose, or compress individual blocks, etc. Choosing to encrypt or decrypt with a specific algo/scheme/key seems like it'd be in the same category as choosing to compress with a specific algo/scheme/level, i.e. something pebble decides to do or not do on a per-file basis and then issues otherwise opaque byte reads and writes? |
I am talking about encryption at large. We'd have the part that knows how to encrypt/decrypt backup sstables inside Pebble and the part that knows how to encrypt/decrypt files more generally in CRDB.
But it is not today, it is all implemented "around" Pebble.. @sumeerbhola WDYT? |
|
this PR looks great!
Chiming in and +1ing to Radu. IIUC, with https://ideal-train-43wvp8m.pages.github.io/pebble#encryption-at-rest, we anticipate that the key management model will change from per-store to cluster-level keys, and encryption will be explicitly designed to happen above the storage layer. Adding encryption inside Pebble for backup SSTs would create a second encryption path that would need to be reconciled with cluster-level key management when/if that design lands (or more likely, moved to the CRDB layer anyway). |
I expect enc-at-rest SSTs written inside a cluster are / will remain distinct from backup SSTs we link into a cluster; backups are explicitly intended to be restored by a cluster other than the one which produced them (e.g if it burned down, belongs to another team, etc) so they are expected to be portable and independent/separable from cluster that produced them. Any keys required to read a file will need to be passed along with the file to the reader of that file, rather than use some sort of cluster-level key tied to some other cluster. |
The part i'm not quite following is why this is distinct from the other aspects of encoding a data file that pebble will later be expected to read? Backup uses pebble-provided library code to produce files that pebble will then be able read if subsequently given their URLs: any aspects of the encoding of those files -- how they're chunked into blocks, how those are recorded in indexes, the codec used to compress blocks, etc -- that pebble needs to understand the meaning of the raw bytes is implemented by the encoding utilities provided by pebble to BACKUP to ensure it can make a file pebble will be able to read. Encoding the bytes such that they're opaque except to a reader who has the right key for that file kinda feels like it is just another aspect of the encoding of the file? @RaduBerinde Can you elaborate on the API you're picturing? I don't think we'd want it to be |
|
@RaduBerinde did you get a chance to think about this any more last week? |
|
What I guess bothers me is that we are adding infrastructure for Pebble to work with encrypted files, but only for external tables and local tables that used to be external. So we will have a strange mix where some files are encrypted inside Pebble, and all other files are encrypted inside the FS. Not a deal-breaker but it would be nice to have a longer-term plan here - will we eventually do all encryption/decryption inside Pebble? I don't have a ton of context on the encryption design/history though, I will defer to @sumeerbhola. |
I guess I thought we'd already more or less said that is is expected/OK that we have code in pebble that is specific to this class of files (prefix synthesis/suffix rewriting/etc) so this diff which will allow files in this class to additionally have an encrypted byte encoding hadn't struck me as a particularly significant departure. We could inject something via a hook of some sort, but I'm not sure that we want it in |
sumeerbhola
left a comment
There was a problem hiding this comment.
My initial reaction is similar to @RaduBerinde and @annrpom but I think I lack enough context, including what changed now to motivate this change. It may be more productive to do a short synchronous meeting.
We've always had encryption be hidden inside vfs.FS or objstorage.Provider since that was a clean separation of concerns. That is, Pebble is not low enough a layer for encryption, which is at a layer even lower than Pebble.
I realize we'll need to store the key in the manifest
The Pebble manifest? I think we'll need some sort of security review to start placing keys in multiple places. I suspect it is fine for the keys to be in-memory in CRDB, and for it to be able to fetch these on-demand on restart, but having the same secret for a shared external file be in 100s of Pebble instances scattered across various clusters doesn't sound right. This also argues for decryption to be in a separate layer that can manage keys more properly.
@sumeerbhola made 1 comment.
Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions (waiting on andrew-r-thomas, annrpom, and RaduBerinde).
|
synced offline: closing in favor of a prototype which does not touch pebble code |
This patch adds encrypting writable and decrypting readable implementations for the purposes of supporting encrypted backups in online restore.