Skip to content

nvme: set --tls and --concat correctly#3204

Merged
igaw merged 2 commits intolinux-nvme:masterfrom
martin-gpy:set_tls_concat
Apr 1, 2026
Merged

nvme: set --tls and --concat correctly#3204
igaw merged 2 commits intolinux-nvme:masterfrom
martin-gpy:set_tls_concat

Conversation

@martin-gpy
Copy link
Copy Markdown
Contributor

There are a few instances where --tls and --concat are still not set correctly. Fix them.

@martin-gpy
Copy link
Copy Markdown
Contributor Author

martin-gpy commented Mar 24, 2026

@igaw ,

Is there something wrong with the build here? Other than the usual checkpatch warning of exceeding 80 chars per line, there should not be any other errors/warnings here for the above commits, but still seeing the above build errors.

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Mar 24, 2026

The CI build just works fine. Your first patch is missing a closing bracket.

Anyway, given this a plain copy of existing code, I suggest to introduce a helper for this instead duplicating it. Then we only have one snippet to maintain.

Second patch, it looks correct. But I am pretty sure it was not there randomly. IIRC, some other configuration setups are depending on it. That means I am not against this, but I think something will break. Given with the --concat stuff, I need to go over these things once again. It's unfortunately a bit tricky to test right now.

I am using a test script to go over 'all' configuration possibilities. It is WIP, thus has bugs and quirks. I still hope to get this integrated into CI. The biggest problem is that it wants to host and target to run on different machines, because of the secrets stored in the keyring. @hreinecke had the idea to use a different keyring for nvmet. This would allow to run this type of tests on one machine. Need to look into this first, I suppose.

test-tls.sh

@martin-gpy
Copy link
Copy Markdown
Contributor Author

The CI build just works fine. Your first patch is missing a closing bracket.

Ah ok.

Anyway, given this a plain copy of existing code, I suggest to introduce a helper for this instead duplicating it. Then we only have one snippet to maintain.

Ok, I can take a look at that.

Second patch, it looks correct. But I am pretty sure it was not there randomly. IIRC, some other configuration setups are depending on it. That means I am not against this, but I think something will break. Given with the --concat stuff, I need to go over these things once again. It's unfortunately a bit tricky to test right now.

Got your point. Honestly I'd prefer this patch itself to avoid incorrectly setting --tls again here, because either --tls or --concat should have already been set earlier. That said, if we want to go down this route itself, can we rely on other tls sysfs attributes like say tls_configured_key & tls_keyring to determine whether --tls or --concat was used? If so, we can use that instead here in the code.

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Mar 24, 2026

Got your point. Honestly I'd prefer this patch itself to avoid incorrectly setting --tls again here, because either --tls or --concat should have already been set earlier. That said, if we want to go down this route itself, can we rely on other tls sysfs attributes like say tls_configured_key & tls_keyring to determine whether --tls or --concat was used? If so, we can use that instead here in the code.

IIRC, the issue is that it's not possible to get the 'input' back from the kernel. And this line was there to workaround this problem. As I said, I am not against to remove it. The read sysfs function should not have side effects like this.

I just wanted to point out, that it likely will cause a regression somewhere else. It's a bit of Whac-A-Mole situation. Hannes is taken up the task to cleanup/refactor the fabrics code and also started to look into the tree code. Hopefully we can improve this part of the code base.

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Mar 30, 2026

I’ve dusted off my test script and tried to figure out where tls = true comes into play. This change will break nvme config --dump. I’ve only looked at --scan, but I think it also affects --modify and --update.

test8() {
  echo "---- connect generated config.json later"
  exec_cmd "${NVME}" connect ${common_args} --tls --tls-key "${tls_key_json}" \
	  --tls-key-identity "${identity}" \
	  --dhchap-secret="${host_secret}" --dhchap-ctrl-secret="${ctrl_secret}" \
	  --dump-config --output-format json
  cp output connect-output.json

  devname="/dev/$(get_nvme_dev connected.json)"
  exec_test "${NVME}" config --scan --dump \
	  --output-format json "${devname}"
  cp output config-output.json

  cat <<EOF > expected.json
[
  {
    "hostnqn":"${hostnqn}",
    "hostid":"${hostid}",
    "dhchap_key":"${host_secret}",
    "subsystems":[
      {
        "nqn":"${subsysnqn}",
        "ports":[
          {
            "transport":"tcp",
            "traddr":"${addr}",
            "trsvcid":"${port}",
            "dhchap_key":"${host_secret}",
            "dhchap_ctrl_key":"${ctrl_secret}",
            "tls":true,
            "keyring":".nvme",
            "tls_key_identity":"$identity",
            "tls_key":"${tls_key_json}"
          }
        ]
      }
    ]
  }
]
EOF
  diff -u expected.json connect-output.json
  cat <<EOF > expected.json
[
  {
    "hostnqn":"${hostnqn}",
    "hostid":"${hostid}",
    "dhchap_key":"${host_secret}",
    "subsystems":[
      {
        "nqn":"${subsysnqn}",
        "ports":[
          {
            "transport":"tcp",
            "traddr":"${addr}",
            "trsvcid":"${port}",
            "dhchap_key":"${host_secret}",
            "dhchap_ctrl_key":"${ctrl_secret}",
            "tls":true,
            "keyring":".nvme"
          }
        ]
      }
    ]
  }
]
EOF
  diff -u expected.json config-output.json
  rm connect-output.json config-output.json expected.json
}
root@localhost:/home/wagi/work/nvme-cli-upstream/.build-debian# ../../test-tls.sh 8
nvme version 3.0-a.3 (git 3.0-a.3)
libnvme version 3.0-a.3 (git 3.0-a.3)
Running test8
---- connect generated config.json later
./nvme connect --transport tcp --traddr 192.168.30.30 --trsvcid 4420 --hostnqn nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36 --hostid befdec4c-2234-11b2-a85c-ca77c773af36 --nqn nqn.io-1 --tls --tls-key NVMeTLSkey-1:01:E+8VYeGJyak3Z/fyW/23NZ20IY7SyHON8RSEi6yOiTlJxU3v: --tls-key-identity NVMe1R01 nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36 nqn.io-1 4IOYNS/rg/PX4rbiebj0LCEBFQ1X5IMI1GlO5s/yS7w= --dhchap-secret=DHHC-1:02:x6e5qrp11n+GgbnTbpQoftMvI6NMyJMODzu9C1drZsZyzWB/B6YTAOWY8xhfGsAWuIwZDA==: --dhchap-ctrl-secret=DHHC-1:02:OFIC9B3zL6Iaa7TKKN13Eyb+0i42vS7V4t5bYVAL08C+ZUxVjfqVisQp/HPabXsLOoVOOw==: --dump-config --output-format json
[
  {
    "hostnqn":"nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36",
    "hostid":"befdec4c-2234-11b2-a85c-ca77c773af36",
    "dhchap_key":"DHHC-1:02:x6e5qrp11n+GgbnTbpQoftMvI6NMyJMODzu9C1drZsZyzWB/B6YTAOWY8xhfGsAWuIwZDA==:",
    "subsystems":[
      {
        "nqn":"nqn.io-1",
        "ports":[
          {
            "transport":"tcp",
            "traddr":"192.168.30.30",
            "trsvcid":"4420",
            "dhchap_key":"DHHC-1:02:x6e5qrp11n+GgbnTbpQoftMvI6NMyJMODzu9C1drZsZyzWB/B6YTAOWY8xhfGsAWuIwZDA==:",
            "dhchap_ctrl_key":"DHHC-1:02:OFIC9B3zL6Iaa7TKKN13Eyb+0i42vS7V4t5bYVAL08C+ZUxVjfqVisQp/HPabXsLOoVOOw==:",
            "tls":true,
            "keyring":".nvme",
            "tls_key_identity":"NVMe1R01 nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36 nqn.io-1 4IOYNS/rg/PX4rbiebj0LCEBFQ1X5IMI1GlO5s/yS7w=",
            "tls_key":"NVMeTLSkey-1:01:E+8VYeGJyak3Z/fyW/23NZ20IY7SyHON8RSEi6yOiTlJxU3v:"
          }
        ]
      }
    ]
  }
]
Attempt 1 of 10...
{
  "Name": "nvme1",
  "Transport": "tcp",
  "Address": "192.168.30.30",
  "State": "live"
}
Controller connected

./nvme config --scan --dump --output-format json /dev/nvme1
[
  {
    "hostnqn":"nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36",
    "hostid":"befdec4c-2234-11b2-a85c-ca77c773af36",
    "dhchap_key":"DHHC-1:02:x6e5qrp11n+GgbnTbpQoftMvI6NMyJMODzu9C1drZsZyzWB/B6YTAOWY8xhfGsAWuIwZDA==:",
    "subsystems":[
      {
        "nqn":"nqn.io-1",
        "ports":[
          {
            "transport":"tcp",
            "traddr":"192.168.30.30",
            "trsvcid":"4420",
            "dhchap_key":"DHHC-1:02:x6e5qrp11n+GgbnTbpQoftMvI6NMyJMODzu9C1drZsZyzWB/B6YTAOWY8xhfGsAWuIwZDA==:",
            "dhchap_ctrl_key":"DHHC-1:02:OFIC9B3zL6Iaa7TKKN13Eyb+0i42vS7V4t5bYVAL08C+ZUxVjfqVisQp/HPabXsLOoVOOw==:",
            "keyring":".nvme"
          }
        ]
      }
    ]
  }
]
Attempt 1 of 10...
{
  "Name": "nvme1",
  "Transport": "tcp",
  "Address": "192.168.30.30",
  "State": "live"
}
Controller connected

--- expected.json       2026-03-30 13:36:19.522859231 +0000
+++ config-output.json  2026-03-30 13:36:19.475859172 +0000
@@ -13,7 +13,6 @@
             "trsvcid":"4420",
             "dhchap_key":"DHHC-1:02:x6e5qrp11n+GgbnTbpQoftMvI6NMyJMODzu9C1drZsZyzWB/B6YTAOWY8xhfGsAWuIwZDA==:",
             "dhchap_ctrl_key":"DHHC-1:02:OFIC9B3zL6Iaa7TKKN13Eyb+0i42vS7V4t5bYVAL08C+ZUxVjfqVisQp/HPabXsLOoVOOw==:",
-            "tls":true,
             "keyring":".nvme"
           }
         ]

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Mar 30, 2026

I was hoping it would be possible to figure out if --tls or --concat was used by checking the present of tls_configure_key, something in the line of:

	if (c->cfg.tls_key) {
		if (c->cfg.tls_configured_key)
			c->cfg.concat = true;
		else
			c->cfg.tls = true;
	}
root@localhost:/sys/class/nvme/nvme1# cat tls_key
073a80d0
root@localhost:/sys/class/nvme/nvme1# cat tls_configured_key
073a80d0

But first I need to get nvme connect --concat working...

@martin-gpy
Copy link
Copy Markdown
Contributor Author

Anyway, given this a plain copy of existing code, I suggest to introduce a helper for this instead duplicating it. Then we only have one snippet to maintain.

Ok, I can take a look at that.

Btw, this is already done. Please take a look.

@martin-gpy
Copy link
Copy Markdown
Contributor Author

root@localhost:/sys/class/nvme/nvme1# cat tls_key
073a80d0
root@localhost:/sys/class/nvme/nvme1# cat tls_configured_key
073a80d0

I'm running the latest SLES15 SP7 MU kernel-default-6.4.0-150700.53.34.1, and I only see tls_key here for both --tls and --concat, but no tls_configured_key additionally for --concat. For e.g. with a --concat connection:

SR650-14-116:/sys/class/nvme/nvme0 # uname -r
6.4.0-150700.53.34-default
SR650-14-116:/sys/class/nvme/nvme0 # cat tls_key
1c1e4d2a
SR650-14-116:/sys/class/nvme/nvme0 # cat tls_configured_key
cat: tls_configured_key: No such file or directory

So looks the tls_configured_key check for --concat will only work with newer kernels only, and not with the older kernels.

@martin-gpy
Copy link
Copy Markdown
Contributor Author

The below snippet should work here:

if (!c->cfg.concat)
        c->cfg.tls = true;

Would that be fine or would you consider that too crude?

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Mar 30, 2026

The problem to solve here is, find out what the command line was by only looking sysfs. That means we can't look at cfg at all. This is what config --scan needs to do be able to do. With config --update or config --modify we could use the input from the config files, but not for config --scan.

Your patch to remove the tls = true is IMO the right thing to do, which is a command line guessing thingy. With the introduction of --concat it seems we reached the limit what can be safely guess. At least this is my current understanding. Maybe we have to just give up and log warning, that the correct config can't be reproduced in this situation.

@martin-gpy
Copy link
Copy Markdown
Contributor Author

martin-gpy commented Mar 31, 2026

The problem to solve here is, find out what the command line was by only looking sysfs. That means we can't look at cfg at all. This is what config --scan needs to do be able to do. With config --update or config --modify we could use the input from the config files, but not for config --scan.

One way out perhaps is to check the presence of the sysfs dhchap_key in case tls_key is present. For --concat connections, dhchap_key would always be present along with tls_key, but not for --tls connections. Granted this is not entirely foolproof because one could still theoretically configure in-band auth along with configured PSK TLS (i.e. --tls) connections, but that's extremely unlikely. If one needed in-band auth along with TLS, one would instead choose generated PSK TLS (i.e. --concat) itself.

So would that help here? This is probably the best choice we have in making a guess here based on the sysfs values.

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Mar 31, 2026

Finally my setup starts to work. I am surprised that the combination 'ffdhe3072', 'hmac(sha384)' and 'nvme gen-dhchap-key -m 2' is failing... it took me a while to get everything sorted out.

If it is not possible to 100% correctly reconstruct the command line arguments, then I'd rather not deal with this at all. It's a nightmare to support. But first let me play with it...

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Apr 1, 2026

As feared, it's just not possible to correctly figure out the mode in use

Even ChatGPT agrees on this:
image

Plan B: Let's see if we can get

/sys/class/nvme/nvmeX/tls_mode

as sysfs entry in upstream

@igaw
Copy link
Copy Markdown
Collaborator

igaw commented Apr 1, 2026

It is wrongly assumed that the presence of the sysfs tls_key
attribute indicates --tls alone was invoked. But this can
also happen if --concat was invoked as well. And both --tls
and --concat are mutually exclusive. Also, both --tls and
--concat are already appropriately set earlier during configured
& generated PSK TLS workflows respectively. So avoid explicitly
setting --tls again here in nvme_read_sysfs_tls() as that's
unnecessary and incorrect too.

Signed-off-by: Martin George <marting@netapp.com>
Only --tls was properly updated in nbft_connect(), and not
--concat. But this is properly done in nvmf_connect_disc_entry()
already. So add a helper function to update both --tls and
--concat and invoke the same from nvmf_connect_disc_entry() and
nbft_connect() respectively.

Signed-off-by: Martin George <marting@netapp.com>
[wagi: reformated the function to improve readability]
Signed-off-by: Daniel Wagner <wagi@kernel.org>
@igaw igaw force-pushed the set_tls_concat branch from b31b659 to 6a8e030 Compare April 1, 2026 09:20
@igaw igaw merged commit 99dd46f into linux-nvme:master Apr 1, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants