Skip to content

Fix(agent): Improve tunnel preservation during agent upgrade when using tunnel. Housekeeping of shellhub-agent files in own directory#5943

Open
ltan10 wants to merge 2 commits intoshellhub-io:masterfrom
ltan10:chore/group-agent-etc-files
Open

Fix(agent): Improve tunnel preservation during agent upgrade when using tunnel. Housekeeping of shellhub-agent files in own directory#5943
ltan10 wants to merge 2 commits intoshellhub-io:masterfrom
ltan10:chore/group-agent-etc-files

Conversation

@ltan10
Copy link
Copy Markdown

@ltan10 ltan10 commented Mar 4, 2026

What kind of change does this PR introduce?

  • Bugfix
  • New Feature
  • Feature Improvment
  • Refactoring
  • Documentation
  • Other, please describe:

Description:

  • Organized go based agent config and key files into own dedicated directory in /etc/shellhub-agent/
    • Does not remove existing /etc/shellhub.key and /etc/shellhub-agent.env or /opt/shellhub/
    • Best migration procedure would be to copy shellhub.key to /etc/shellhub-agent/shellhub.key prior to performing installation/upgrade
  • Modified default PRIVATE_KEY location value for podman, snap and docker installation in shell install script to /etc/shellhub-agent
  • Restarts shellhub-agent service at end of installation step instead of ending shellhub-agent at start of installation procedure.
    • Reduces risk of removing device from shellhub tunnel network if agent installation/re-installation/upgrade process was performed over shellhub tunnel

Migration Guide *Optional*

The following guide also applies to migration from runc-shellhub-agent to native go-shellhub-agent
The following steps is only needed if want to retain same device unique id in shellhub (no pending request). If migration steps is not performed. A new pending request for the device will need to be accepted.

  1. Manually create /etc/shellhub-agent/ directory
  2. Copy shellhub.key from /opt/shellhub-agent/shelhlub.key or /etc/shellhub.key to /etc/shellhub-agent/shellhub.key
  3. Perform agent installation/upgrade steps as normal
  4. Ensure installation/upgrade was successful and tunnel is active
  5. Able to remove the following /opt/shellhub-agent/ and /etc/shellhub.key and /etc/shellhub-agent.env

@ltan10 ltan10 requested a review from a team as a code owner March 4, 2026 23:59
@ltan10 ltan10 marked this pull request as draft March 5, 2026 00:25
@ltan10 ltan10 force-pushed the chore/group-agent-etc-files branch from fbab19e to a7beeba Compare March 5, 2026 01:01
@ltan10 ltan10 marked this pull request as ready for review March 5, 2026 01:05
@ltan10 ltan10 force-pushed the chore/group-agent-etc-files branch from a7beeba to 1e42d5b Compare March 5, 2026 07:17
@ltan10 ltan10 changed the title chore: organized go based agent config and key into dedicated directory chore(agent): organized go based agent config and key into dedicated directory Mar 5, 2026
@ltan10 ltan10 force-pushed the chore/group-agent-etc-files branch from 1e42d5b to 50f40b7 Compare March 10, 2026 00:43
Copy link
Copy Markdown
Member

@otavio otavio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest problem I foresee here is migrating the field-deployed agents.

So even though I agree with the idea of this change, we need to consider how we will migrate the existing agents to work with this new approach or provide a backward-compatible way for them to keep working.

How do you foresee solving this issue?

@ltan10
Copy link
Copy Markdown
Author

ltan10 commented Mar 10, 2026

With all the testing. I've stumbled upon this myself.
From what I have seen, the migration is fine. as the install script during the migration from roofts to go based binary for the agents already leave the config.json behind and recreate the the new env config file.

The only persistent remaining file is just shellhub.key

Two methods for transition:

  1. Just install/add/upgrade agent as normal using install script as per normal
  • New shellhub.key gets regenerated in the new directory and device appears for pending approval.
  • Requires user to manually remove existing accepted devices due to hostname clashes and accept the new pending devices
    or
  1. Use existing shellhub.key on device by manually transfering it to /etc/shellhub-agent prior to executing install script.
  • install script agent upgrade will just use existing key.

@otavio someone should be able to replicate my findings for confirmation.
I have been generally just been using option 1. And have only tested on the standalone deployment.
If someone can verify that i have changed the correct sections for the other installation methods and test, that would be great.

EDIT:
Should be clear that the above mentioned methods only works with prebuilt agent assets from shellhub releases, so a new release would have to be done for a smooth transition.
Users who pull the repo without the prebuild agent asset release will have to manually build and tweak their install procedure.

@ltan10
Copy link
Copy Markdown
Author

ltan10 commented Mar 11, 2026

@otavio
It later occured to me that my testing was performed using a physical connection and not a shellhub tunnel.

You are correct that users using the tunnel to upgrade their agent will have their tunnel ceremoniously terminated when migrating from the runc binary to go-native binary.

I tried sending the shell install script to background and including using nohup to ignore sighup. However still no luck as the child process executing the agent install did not appear to initiate.

This observation was the same for an standalone agent upgrade made through the tunnel for the following situations

  • go-agent (ungrouped) -> go-agent (ungrouped)
  • go-agent (grouped) -> go-agent (grouped)
  • go-agent (ungrouped) -> go-agent (grouped)

With all tunnels terminated when users used shellhub tunnel to upgrade the agent

@ltan10 ltan10 marked this pull request as draft March 11, 2026 03:43
@ltan10
Copy link
Copy Markdown
Author

ltan10 commented Mar 11, 2026

Found a work around, can someone review please and test the containerised installation/upgrade.

Problem:
Basically found the native binary was affected by sighup and terminates, attempt to execute it using nohup in background did not work and did not provide verbosity to users.

Running the installation under systemd-run worked, but once again was not verbose, and increased complexity in managing the existing service once finished.

Solution:
Instead of stopping shellhub-agent.service at the start of installation, we restart/start the service at the end of installation.
This maintains an active tunnel session till the last moment after binary has been extracted and service is registered.
The session ends however the service is active and tunnel connection is resumed.

Changes:
Agent installation no longer stops the service at the start of installation.
Instead the agent service is restarted at end of install procedure, hence enabling the service does not start the service to prevent duplicate connection request at first installation.

@ltan10 ltan10 marked this pull request as ready for review March 11, 2026 08:44
@ltan10 ltan10 force-pushed the chore/group-agent-etc-files branch from ad7c9d8 to 0d222f4 Compare March 11, 2026 09:06
@ltan10 ltan10 changed the title chore(agent): organized go based agent config and key into dedicated directory Fix(agent): Improve tunnel preservation during agent upgrade when using tunnel. Housekeeping of shellhub-agent files in own directory Mar 31, 2026
@ltan10 ltan10 requested review from a team as code owners March 31, 2026 22:37
@ltan10 ltan10 force-pushed the chore/group-agent-etc-files branch from 88ffb2d to 0d222f4 Compare March 31, 2026 22:37
@ltan10 ltan10 closed this Mar 31, 2026
@ltan10 ltan10 deleted the chore/group-agent-etc-files branch March 31, 2026 22:43
@ltan10 ltan10 restored the chore/group-agent-etc-files branch March 31, 2026 22:44
@ltan10 ltan10 deleted the chore/group-agent-etc-files branch March 31, 2026 22:45
@ltan10 ltan10 restored the chore/group-agent-etc-files branch March 31, 2026 22:47
@ltan10 ltan10 reopened this Mar 31, 2026
@ltan10 ltan10 force-pushed the chore/group-agent-etc-files branch from 0d222f4 to 0d6ec33 Compare March 31, 2026 22:49
@ltan10
Copy link
Copy Markdown
Author

ltan10 commented Mar 31, 2026

The biggest problem I foresee here is migrating the field-deployed agents.

So even though I agree with the idea of this change, we need to consider how we will migrate the existing agents to work with this new approach or provide a backward-compatible way for them to keep working.

How do you foresee solving this issue?

@otavio I noticed that the migration from non native shellhub-agent to the go-native shellhub-agent triggers the same issue of a new pending device join request due to the change in location of shellhub.key from /opt to `/etc/, especially for standalone installations.

I have provided a migration path for users who don't wish to deal with just removing old device and re-accepting the new device above. By moving the shellhub-agent service restart to end of installation, this allows users to upgrade the agent over a shellhub tunnel as the tunnel is not terminated when the service is stopped at the start of the installation process.

@ltan10 ltan10 requested a review from otavio March 31, 2026 23:11
agent installer: avoid service disruption during upgrade over SSH tunnel

- Remove pre-install service disable step
- Enable service without starting it immediately
- Restart service only after install completes

Allows upgrades to run over an active SSH tunnel and ensures the
tunnel is re-established after the service restarts with the updated
binary.
@ltan10 ltan10 force-pushed the chore/group-agent-etc-files branch from 0d6ec33 to 35b4a99 Compare April 1, 2026 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants