Skip to content

objstorage: add encrypting writable and decrypting readable#5781

Closed
andrew-r-thomas wants to merge 1 commit intocockroachdb:masterfrom
andrew-r-thomas:objstorage-encryption
Closed

objstorage: add encrypting writable and decrypting readable#5781
andrew-r-thomas wants to merge 1 commit intocockroachdb:masterfrom
andrew-r-thomas:objstorage-encryption

Conversation

@andrew-r-thomas
Copy link
Copy Markdown
Contributor

This patch adds encrypting writable and decrypting readable implementations for the purposes of supporting encrypted backups in online restore.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

return nil
}

func AppearsEncrypted(text []byte) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this one want a name that clarifies it is the external file encryption scheme?

)

const (
encryptionVersion = 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "externalEncryptionVersion" / preamble / nonce /etc to be clearer that they don't apply to pebble's internal encryption-at-rest support? or is being in the objstorageprovider pkg enough?

Maybe let @RaduBerinde weigh in

@github-actions
Copy link
Copy Markdown

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions
Copy link
Copy Markdown

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions
Copy link
Copy Markdown

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@andrew-r-thomas andrew-r-thomas marked this pull request as ready for review February 17, 2026 19:25
@andrew-r-thomas andrew-r-thomas requested a review from a team as a code owner February 17, 2026 19:25
@dt dt requested a review from RaduBerinde February 17, 2026 20:01
Copy link
Copy Markdown
Contributor

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM / it's just a mechanical mv from the CRDB repo, but maybe wait for @RaduBerinde to confirm as well, mostly just that this package the the right place for these to land

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a big advantage to having this code in Pebble (as opposed to doing it on the cockroach side, inside remote.Storage.ReadObject?) It would be nice if it all lived in a single place.

I realize we'll need to store the key in the manifest but that could be implemented in a generic way (we'd add an ObjMeta []byte to ExternalFile and pass it back to remote.Storage.ReadObject).

@RaduBerinde made 1 comment.
Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions (waiting on andrew-r-thomas, annrpom, and sumeerbhola).

@dt
Copy link
Copy Markdown
Contributor

dt commented Feb 23, 2026

It would be nice if it all lived in a single place.

I think that's the intention here, isn't it? i.e. that pebble is the "one place" for all the code required to write or read a backup sstable, with CRDB using that pebble code to write these SSTs during a backup and then pebble using it to read them after they're linked into the LSM?

(externalStorageWrapper) ReadObject

Hm. I'm not sure I see the motivation here? Like, right now this is a pretty well defined API: you specify the object name you want to read or write and get to read or write bytes to/from it. The meaning you ascribe to those bytes is up to you: you can store compressed bytes or uncompressed as you choose, or compress individual blocks, etc. Choosing to encrypt or decrypt with a specific algo/scheme/key seems like it'd be in the same category as choosing to compress with a specific algo/scheme/level, i.e. something pebble decides to do or not do on a per-file basis and then issues otherwise opaque byte reads and writes?

@RaduBerinde
Copy link
Copy Markdown
Member

I think that's the intention here, isn't it? i.e. that pebble is the "one place" for all the code required to write or read a backup sstable, with CRDB using that pebble code to write these SSTs during a backup and then pebble using it to read them after they're linked into the LSM?

I am talking about encryption at large. We'd have the part that knows how to encrypt/decrypt backup sstables inside Pebble and the part that knows how to encrypt/decrypt files more generally in CRDB.

Choosing to encrypt or decrypt with a specific algo/scheme/key seems like it'd be in the same category as choosing to compress with a specific algo/scheme/level, i.e. something pebble decides to do or not do on a per-file basis and then issues otherwise opaque byte reads and writes?

But it is not today, it is all implemented "around" Pebble..

@sumeerbhola WDYT?

@annrpom
Copy link
Copy Markdown
Contributor

annrpom commented Feb 24, 2026

this PR looks great!

I am talking about encryption at large. We'd have the part that knows how to encrypt/decrypt backup sstables inside Pebble and the part that knows how to encrypt/decrypt files more generally in CRDB.

Chiming in and +1ing to Radu. IIUC, with https://ideal-train-43wvp8m.pages.github.io/pebble#encryption-at-rest, we anticipate that the key management model will change from per-store to cluster-level keys, and encryption will be explicitly designed to happen above the storage layer. Adding encryption inside Pebble for backup SSTs would create a second encryption path that would need to be reconciled with cluster-level key management when/if that design lands (or more likely, moved to the CRDB layer anyway).

@dt
Copy link
Copy Markdown
Contributor

dt commented Feb 24, 2026

reconcile with cluster-level key management

I expect enc-at-rest SSTs written inside a cluster are / will remain distinct from backup SSTs we link into a cluster; backups are explicitly intended to be restored by a cluster other than the one which produced them (e.g if it burned down, belongs to another team, etc) so they are expected to be portable and independent/separable from cluster that produced them. Any keys required to read a file will need to be passed along with the file to the reader of that file, rather than use some sort of cluster-level key tied to some other cluster.

@dt
Copy link
Copy Markdown
Contributor

dt commented Feb 24, 2026

that knows how to encrypt/decrypt backup sstables inside Pebble

The part i'm not quite following is why this is distinct from the other aspects of encoding a data file that pebble will later be expected to read? Backup uses pebble-provided library code to produce files that pebble will then be able read if subsequently given their URLs: any aspects of the encoding of those files -- how they're chunked into blocks, how those are recorded in indexes, the codec used to compress blocks, etc -- that pebble needs to understand the meaning of the raw bytes is implemented by the encoding utilities provided by pebble to BACKUP to ensure it can make a file pebble will be able to read. Encoding the bytes such that they're opaque except to a reader who has the right key for that file kinda feels like it is just another aspect of the encoding of the file?

@RaduBerinde Can you elaborate on the API you're picturing? I don't think we'd want it to be ReadObject since we don't want to do any computation during the backing copy-compactions, and after those are local we won't be using ReadObject to open them, so I think this would be some other sort of hook for Readable + key -> Readable that can be called for both local or remote files.

@dt
Copy link
Copy Markdown
Contributor

dt commented Mar 4, 2026

@RaduBerinde did you get a chance to think about this any more last week?

@RaduBerinde
Copy link
Copy Markdown
Member

What I guess bothers me is that we are adding infrastructure for Pebble to work with encrypted files, but only for external tables and local tables that used to be external. So we will have a strange mix where some files are encrypted inside Pebble, and all other files are encrypted inside the FS. Not a deal-breaker but it would be nice to have a longer-term plan here - will we eventually do all encryption/decryption inside Pebble?

I don't have a ton of context on the encryption design/history though, I will defer to @sumeerbhola.

@dt
Copy link
Copy Markdown
Contributor

dt commented Mar 4, 2026

but only for external tables and local tables that used to be external

I guess I thought we'd already more or less said that is is expected/OK that we have code in pebble that is specific to this class of files (prefix synthesis/suffix rewriting/etc) so this diff which will allow files in this class to additionally have an encrypted byte encoding hadn't struck me as a particularly significant departure.

We could inject something via a hook of some sort, but I'm not sure that we want it in ReadObject as we didn't originally want to do anything beyond the minimum required IO work during copy compactions to de-remote the backings. So it'd still be something, I think, that looks pretty specific to external and formerly-external files either way.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction is similar to @RaduBerinde and @annrpom but I think I lack enough context, including what changed now to motivate this change. It may be more productive to do a short synchronous meeting.

We've always had encryption be hidden inside vfs.FS or objstorage.Provider since that was a clean separation of concerns. That is, Pebble is not low enough a layer for encryption, which is at a layer even lower than Pebble.

I realize we'll need to store the key in the manifest

The Pebble manifest? I think we'll need some sort of security review to start placing keys in multiple places. I suspect it is fine for the keys to be in-memory in CRDB, and for it to be able to fetch these on-demand on restart, but having the same secret for a shared external file be in 100s of Pebble instances scattered across various clusters doesn't sound right. This also argues for decryption to be in a separate layer that can manage keys more properly.

@sumeerbhola made 1 comment.
Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions (waiting on andrew-r-thomas, annrpom, and RaduBerinde).

@andrew-r-thomas
Copy link
Copy Markdown
Contributor Author

synced offline: closing in favor of a prototype which does not touch pebble code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants