Skip to content

Add an initial varlink IPC interface#222

Draft
cgwalters wants to merge 2 commits intobootc-dev:mainfrom
cgwalters:varlink-zlink
Draft

Add an initial varlink IPC interface#222
cgwalters wants to merge 2 commits intobootc-dev:mainfrom
cgwalters:varlink-zlink

Conversation

@cgwalters
Copy link
Collaborator

I'm thinking about bootc-dev/bootc#522
again, and varlink is definitely an option there.

In order for us to gain experience with it, I think it makes
sense to start here. In some experimentation, it seems zlink
is indeed better maintained than varlink-rs and more complete.
(there's some unreleased async support in varlink-rs, and
the activation/service stuff was not complete).

In this initial pass, we first cover the images API (which is
really just a thin wrapper for enumerating images in the container
runtime store). The more interesting one is io.bootc.vk.ephemeral
which supports spawning.

Note: The workspace lint for unused_must_use is relaxed from forbid to deny
so that zlink's proc-macro-generated #[allow(unused)] does not conflict.
the practical enforcement is identical.

Refactor the container removal logic out of remove_all_ephemeral_containers
into a standalone remove_ephemeral_containers function that returns
per-container results. This enables the varlink IPC layer to call
container removal without going through the interactive CLI prompt.

Also make list_ephemeral_containers pub(crate) so other modules can
call it directly.

Assisted-by: OpenCode (claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
I'm thinking about bootc-dev/bootc#522
again, and varlink is definitely an option there.

In order for us to gain experience with it, I think it makes
sense to start here. In some experimentation, it seems zlink
is indeed better maintained than varlink-rs and more complete.
(there's some unreleased async support in varlink-rs, and
 the activation/service stuff was not complete).

In this initial pass, we first cover the `images` API (which is
really just a thin wrapper for enumerating images in the container
runtime store). The more interesting one is `io.bootc.vk.ephemeral`
which supports spawning.

Note: The workspace lint for unused_must_use is relaxed from forbid to deny
so that zlink's proc-macro-generated #[allow(unused)] does not conflict.
the practical enforcement is identical.

Assisted-by: OpenCode (claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a varlink IPC interface for managing ephemeral VMs, a significant and well-implemented feature. While it includes service implementation, refactoring in ephemeral.rs and ssh.rs, and comprehensive integration tests, it also introduces argument injection vulnerabilities and broken access control. Specifically, several methods accept user-supplied container IDs or image names and pass them directly to podman commands, potentially allowing operations on arbitrary containers and images not managed by bcvk. To mitigate this, it is recommended to use the -- separator in all podman calls and verify that target containers possess the expected bcvk.ephemeral=1 label before performing operations. Additionally, there are a few suggestions to improve maintainability and fix a version mismatch that would break the build.

interface = "io.bootc.vk.images",
vendor = "io.bootc.vk",
product = "bcvk",
version = "0.11.0",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The version in the #[zlink::service] attribute is hardcoded to "0.11.0", but the crate version in crates/kit/Cargo.toml is "0.12.0". This will cause the version_matches_cargo_toml test at the end of this file to fail. Please update this version to match the crate's version. You will also need to update the hardcoded version in the test itself to ensure they stay in sync.

Suggested change
version = "0.11.0",
version = "0.12.0",

Comment on lines +573 to +576
assert_eq!(
"0.11.0",
env!("CARGO_PKG_VERSION"),
"varlink service version must match Cargo.toml"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This test will fail because env!("CARGO_PKG_VERSION") will expand to "0.12.0", but it's being compared against the hardcoded string "0.11.0". Please update this to match the crate's version to ensure the test passes and correctly validates the version sync.

Suggested change
assert_eq!(
"0.11.0",
env!("CARGO_PKG_VERSION"),
"varlink service version must match Cargo.toml"
assert_eq!(
env!("CARGO_PKG_VERSION"),
"0.12.0",
"varlink service version must match Cargo.toml"
);

/// block used by both the CLI (`rm-all`) and the varlink `Rm` method.
pub(crate) fn remove_single_container(container_id: &str) -> RemoveContainerResult {
let result = Command::new("podman")
.args(["rm", "-f", container_id])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The container_id parameter is passed directly to the podman rm -f command without a -- separator. Since this value comes from an external IPC interface (Varlink), an attacker could provide a flag such as --all instead of a container ID. This would cause podman to remove all containers accessible to the process, leading to a significant denial of service. Additionally, the function does not verify that the container has the bcvk.ephemeral=1 label, allowing the removal of arbitrary containers not managed by this tool.

Suggested change
.args(["rm", "-f", container_id])
.args(["rm", "-f", "--", container_id])

Comment on lines 126 to 130
if options.allocate_tty {
cmd.args(["exec", "-it", container_name, "ssh"]);
} else {
cmd.args(["exec", container_name, "ssh"]);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The container_name parameter is passed to podman exec without a -- separator, which could allow argument injection if the name starts with a hyphen. Furthermore, the function does not verify that the target container is an ephemeral VM managed by bcvk (e.g., by checking for the bcvk.ephemeral=1 label). This could allow an attacker with access to the Varlink interface to execute SSH commands in any container on the system, potentially leading to unauthorized access or privilege escalation.

Suggested change
if options.allocate_tty {
cmd.args(["exec", "-it", container_name, "ssh"]);
} else {
cmd.args(["exec", container_name, "ssh"]);
}
if options.allocate_tty {
cmd.args(["exec", "-it", "--", container_name, "ssh"]);
} else {
cmd.args(["exec", "--", container_name, "ssh"]);
}

Comment on lines +242 to +249
Err(e) => {
let msg = e.to_string();
if msg.contains("No such image") || msg.contains("image not known") {
Err(ImagesError::NotFound { name })
} else {
Err(ImagesError::PodmanError { message: msg })
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Relying on string matching of the error message from podman to detect a "not found" condition is brittle. If podman changes its error messages in a future version, this logic will break.

While this might be the only option if podman doesn't provide structured error information or distinct exit codes for this case, it would be good to add a comment here acknowledging this fragility. If podman provides a more reliable way to distinguish this error (e.g., a specific exit code), it would be better to use that.

Comment on lines +255 to +262
let containers = tokio::task::spawn_blocking(crate::ephemeral::list_ephemeral_containers)
.await
.map_err(|e| EphemeralError::PodmanError {
message: e.to_string(),
})?
.map_err(|e| EphemeralError::PodmanError {
message: e.to_string(),
})?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pattern of mapping both JoinError and the inner Result's error to a PodmanError is repeated in several methods (ps, rm_all, run, exec, wait_ssh_ready). To reduce duplication and improve readability, consider defining a local error mapping closure.

For example, you could change this to:

    async fn ps(&self) -> Result<PsReply, EphemeralError> {
        let map_err = |e: impl std::fmt::Display| EphemeralError::PodmanError {
            message: e.to_string(),
        };
        let containers = tokio::task::spawn_blocking(crate::ephemeral::list_ephemeral_containers)
            .await
            .map_err(map_err)?
            .map_err(map_err)?;

        // ...
    }

This same pattern can be applied to the other methods in this service implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant