storage: Checking and fixing system config for NBDE#17796
storage: Checking and fixing system config for NBDE#17796mvollmer merged 1 commit intocockpit-project:mainfrom
Conversation
6cbe45d to
d0432a4
Compare
d0432a4 to
b11f590
Compare
b11f590 to
7648613
Compare
ef4559e to
3a42b64
Compare
3a42b64 to
8a2e10a
Compare
8a2e10a to
d08cb6b
Compare
d08cb6b to
1f71fe0
Compare
ee93d26 to
038db1b
Compare
038db1b to
c011a1f
Compare
c011a1f to
36d0273
Compare
|
Awesome work! I have a bunch of design-related feedback:
|
Thanks! This is really helpful.
Mostly, yes, and we should move it up in the hierarchy. However, there are two sets of checks-and-fixes: One for the root filesystem, and another one for everything else. Currently, the buttons in the filesystem tab start one of these two flows, depending on whether the filesystem in question is the root filesystem or not. However, the checks and fixes for a non-root filesystem are much lighter than for the root filesystem. They might not justify the full blown dialog that is implemented here. In any case, NBDE for the root filesystem is the main thing we are addressing here, so let's ignore non-root filesystems for now. Thus, yes, this is a system-level option and should be somewhere higher up.
Yes. However, it is always possible that the system gets broken later on. Hmm. The best would be to find a very fast reliable way to figure out if anything needs to be fixed that can be done every time the Storage page loads. I will have a try at this. (We would need to check whether the root filesystem uses NBDE, and whether the initrd includes support for this.) However, this is starting to move us outside of the scope of Cockpit, isn't it? Isn't this something that Insights should be doing? (Checking your system in the background for things that need fixing, alerting you about them, and offering ways to actually fix them.) Cockpit could certainly show the alerts and let people carry out the fixes, but I'd say we shouldn't go and run the actual checks in the background. So, what if, as a first step, we only do the check-and-fix flow when adding NBDE to the root filesystem, and forget about a global button? (If I find a quick way to check whether to show this button, we can reconsider this.)
Adding a NBDE key is already a wizard: First you enter the contact information for the server, then you need to compare fingerprints and give the final OK. In my current implementation, the check-and-fix then runs as a third step. I agree that it is much better to run it first: If the fixing fails, you should probably not add the NBDE key. But the first step in the existing wizard includes selecting whether to add a regular passphrase, or a NBDE key. So we would have step 1: Select NBDE, enter server address; step 2: check system for NBDE support; step 3: compare and confirm fingerprints. Is that okay? As you propose, I think we can do the check when clicking "Apply" in step 1, and immediately skip to step 3 when nothing needs fixing. Step 2 would only happen when there is something to fix. |
Sure.
Wizards are for interactive input. Checking the system is not interactive; it's something that does not require input from the user. Therefore, it should not be a step in a wizard. And this doesn't sound like it should be a wizard. A wizard is a particular UI pattern. You're talking about a process. But as the verification step does require user input (accepting the key or not), it could be a wizard, I suppose. That's better than two sequential dialogs. But it's not as good as a dialog that changes based on context. It could be implemented as a wizard, but we don't need (nor want) a sidebar, a big header, or next/back buttons that PF wizards provide. Here's a mockup of how it could work. It definitely needs updates and iteration. (I made this in ExcaliDraw, so you could just open this PNG there if you want to edit it and it would show you the vector form, as it's embedded.) |
|
Also: Since we'll check for a number of things for NBDE support and it takes a little while, the checking step could have a progress bar, as we can just divide it by # of steps and when each completes, advance the bar. |
b228fdc to
908583a
Compare
908583a to
8d06e1a
Compare
8d06e1a to
cb68696
Compare
cb68696 to
ee4e0d9
Compare
ee4e0d9 to
c33e9cf
Compare
|
The final reboot sometimes (always?) times out on rhel-8... it completes in 20 seconds here locally. Let's debug this some. |
Now it succeeded on non-rhos. The console screenshots have been taken and can be found in the results directory. For example, https://cockpit-logs.us-east-1.linodeobjects.com/pull-17796-20221114-133844-c33e9cff-rhel-8-8/failed-reboot.ppm |
c33e9cf to
3c4d11a
Compare
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable
There was a problem hiding this comment.
If there is no "/" this will blow up, I guess that's intendend?
3c4d11a to
3f8f659
Compare
|
@garrett, what do you think about these open issues?
|
jelly
left a comment
There was a problem hiding this comment.
Looks good, some minor nitpicks/questions.
There was a problem hiding this comment.
This is cool! Would be nice if this was more generic and maybe by default in wait_reboot. But then we'd have to add a wrapper in testlib.py so self.wait_reboot as that can call attach.
There was a problem hiding this comment.
Arch is waiting on latchset/clevis#374 I'll try to poke the PR
There was a problem hiding this comment.
This feels a little brittle, as also lsinitrd -m also prints early CPIO image and stuff. But there doesn't seem to be a better way.
There was a problem hiding this comment.
This is what the RHEL documentation recommends: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/security_hardening/configuring-automated-unlocking-of-encrypted-volumes-using-policy-based-decryption_security-hardening#configuring-automated-unlocking-using-a-tang-key-in-the-web-console_configuring-automated-unlocking-of-encrypted-volumes-using-policy-based-decryption
There was a problem hiding this comment.
Right, I understand that. But it also prints other stuff. It was more a wish that more tools would have parseable output with for example --json.
There was a problem hiding this comment.
So this might be a silly question, but doesn't grubby also re-generate initramfs. So in theory we could first install clevis-dracut. But that's a real nitpick I feel :)
There was a problem hiding this comment.
grubby does not regenerate the initrd. At least it never did for me.
There was a problem hiding this comment.
If there is no "/" this will blow up, I guess that's intendend?
It's tolerated. :-) We could produce a better error message, but this really shouldn't happen at all. |
| if (p.waiting) { | ||
| text = _("Waiting for other software management operations to finish"); | ||
| } else if (p.package) { |
There was a problem hiding this comment.
These 3 added lines are not executed by any test. Details
| } else if (p.package) { | ||
| let fmt; | ||
| if (p.info == PkEnum.INFO_DOWNLOADING) | ||
| fmt = _("Downloading $0"); |
There was a problem hiding this comment.
This added line is not executed by any test. Details
| if (p.info == PkEnum.INFO_DOWNLOADING) | ||
| fmt = _("Downloading $0"); | ||
| else if (p.info == PkEnum.INFO_REMOVING) | ||
| fmt = _("Removing $0"); |
There was a problem hiding this comment.
This added line is not executed by any test. Details
| title: cockpit.format(_("The $0 package must be installed."), package_name), | ||
| func: progress => { | ||
| if (data.remove_names.length > 0) | ||
| return Promise.reject(cockpit.format(_("Installing $0 would remove $1."), name, data.remove_names[0])); |
There was a problem hiding this comment.
This added line is not executed by any test. Details
| if (data.remove_names.length > 0) | ||
| return Promise.reject(cockpit.format(_("Installing $0 would remove $1."), name, data.remove_names[0])); | ||
| else if (data.unavailable_names.length > 0) | ||
| return Promise.reject(cockpit.format(_("The $0 package is not available from any repository."), name)); |
There was a problem hiding this comment.
This added line is not executed by any test. Details
|
|
||
| function ensure_initrd_clevis_support(steps, progress, package_name) { | ||
| const task = cockpit.spawn(["lsinitrd", "-m"], { superuser: true, err: "message" }); | ||
| progress(_("Checking for NBDE support in the initrd"), () => task.close()); |
There was a problem hiding this comment.
This added line is not executed by any test. Details
| const fsys_options = fsys_config && parse_options(decode_filename(fsys_config[1].opts.v)); | ||
|
|
||
| if (!fsys_options || fsys_options.indexOf(option) >= 0) | ||
| return Promise.resolve(); |
There was a problem hiding this comment.
This added line is not executed by any test. Details
| const crypto_config = array_find(block.Configuration, function (c) { return c[0] == "crypttab" }); | ||
| const crypto_options = crypto_config && parse_options(decode_filename(crypto_config[1].options.v)); | ||
| if (!crypto_options || crypto_options.indexOf(option) >= 0) | ||
| return Promise.resolve(); |
There was a problem hiding this comment.
This added line is not executed by any test. Details
| } else | ||
| return Promise.resolve(); | ||
| } else |
There was a problem hiding this comment.
These 3 added lines are not executed by any test. Details



Outdated demo: https://www.youtube.com/watch?v=mbv-syeLHsc
Release note:
Storage: Set up a system to use NBDE
It is possible (and unfortunately likely) that a system is misconfigured and can not mount filesystems during boot that rely on Clevis to unlock them. Cockpit can now fix this when adding a keyserver to a encrypted filesystem.