Integrate internal SSH readiness checks with Ansible checks by ipspace · Pull Request #3049 · ipspace/netlab

ipspace · 2026-01-29T16:44:39Z

This commit refactors the readiness checks to integrate the internal SSH readiness checks with the Ansible checks (for example, the check for the first Junos interface):

Data structure changes:

The devices that require readiness checks MUST have netlab_ready group variable, which should include value 'ansible' for devices with Ansible checks. This commit modifies device definitions for all devices using readiness checks
The 'ansible' output module creates netlab_ready_ansible and netlab_ready_ssh groups

netlab initial changes:

The 'ready' module got its own run function which is invoked with the args.ready option
The 'ready' module first collect the nodes based on their wait-for-ready requirements, executes internal readiness checks, and starts the Ansible 'device-ready' playbook if needed
The internal readiness checks can be disable in topology defaults (defaults.netlab.initial.ready.check variable)
deploy.run function calls ready.run function as one of the first steps
deploy.run and ready.run functions use log.section_header for improved logging functionality

Ansible-related changes:

The 'device-ready.ansible' playbook is split into two plays (SSH readiness and Ansible checks)
The new ansible groups are used in 'device-ready.ansible' playbook to limit the hosts involved in each play
The 'wait-for-ready' task list is no longer included into the initial-config.ansible playbook -- the readiness check is performed solely in the device-ready.ansible playbook
The 'wait-for-ready' task list no longer performs the generic readiness checks (ssh was the only generic check). The generic checks are performed as plays in the 'device-ready.ansible' playbook

Copilot

Pull request overview

This pull request refactors the device readiness checking system to better integrate internal SSH checks with device-specific Ansible checks. The key improvement is separating the responsibilities: Python code handles SSH connectivity checks, while Ansible playbooks handle device-specific readiness conditions (like waiting for interfaces to appear).

Changes:

Refactored readiness check architecture to separate internal (Python) SSH checks from Ansible-based device checks
Modified device definitions to specify readiness requirements via netlab_ready group variable with values 'ssh' and/or 'ansible'
Split the device-ready Ansible playbook into two distinct plays for SSH and device-specific checks

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
netsim/outputs/ansible.py	Creates dynamic Ansible inventory groups (netlab_ready_ssh, netlab_ready_ansible) based on device readiness requirements
netsim/devices/*.yml	Updates device definitions to specify readiness check requirements; Junos family devices inherit from parent, other devices explicitly declare
netsim/defaults/netlab.yml	Adds configuration to specify which readiness checks use internal code vs Ansible
netsim/cli/initial/utils.py	Refactors Ansible argument building; adds get_deploy_nodeset helper function
netsim/cli/initial/ready.py	Implements new run function with internal SSH checks and conditional Ansible playbook execution
netsim/cli/initial/deploy.py	Integrates ready.run into deployment flow with improved logging via section_header
netsim/cli/initial/init.py	Simplifies initial command flow by delegating to ready.run
netsim/ansible/tasks/wait-for-ready.yml	Removes generic SSH checks (now handled by dedicated play)
netsim/ansible/tasks/readiness-check/vyos-clab.yml	Removes redundant wait_for_connection (SSH checked in separate play)
netsim/ansible/initial-config.ansible	Removes wait-for-ready import (readiness now checked before config deployment)
netsim/ansible/device-ready.ansible	Splits into two plays: SSH readiness and device-specific conditions with appropriate tags
docs/netlab/initial.md	Updates documentation to explain the two-stage readiness checking and configuration option

ipspace · 2026-01-29T16:54:19Z

@ssasso @DanPartelly @sdargoeuves @ddutt -- Anyone wants to try these out? Things are slowly getting in shape (I still have to separate "normalize" phase from the rest of the configs)

This commit refactors the readiness checks to integrate the internal SSH readiness checks with the Ansible checks (for example, the check for the first Junos interface): Data structure changes: * The devices that require readiness checks MUST have netlab_ready group variable, which should include value 'ansible' for devices with Ansible checks. This commit modifies device definitions for all devices using readiness checks * The 'ansible' output module creates netlab_ready_ansible and netlab_ready_ssh groups netlab initial changes: * The 'ready' module got its own run function which is invoked with the args.ready option * The 'ready' module first collect the nodes based on their wait-for-ready requirements, executes internal readiness checks, and starts the Ansible 'device-ready' playbook if needed * The internal readiness checks can be disable in topology defaults (defaults.netlab.initial.ready._check_ variable) * deploy.run function calls ready.run function as one of the first steps * deploy.run and ready.run functions use log.section_header for improved logging functionality Ansible-related changes: * The 'device-ready.ansible' playbook is split into two plays (SSH readiness and Ansible checks) * The new ansible groups are used in 'device-ready.ansible' playbook to limit the hosts involved in each play * The 'wait-for-ready' task list is no longer included into the initial-config.ansible playbook -- the readiness check is performed solely in the device-ready.ansible playbook * The 'wait-for-ready' task list no longer performs the generic readiness checks (ssh was the only generic check). The generic checks are performed as plays in the 'device-ready.ansible' playbook

sdargoeuves · 2026-02-01T00:06:49Z

Not sure what I'm missing, it fails for me:

╰─❯ dnetlab initial -l sw1 -vvv   
Unrecognized Ansible playbook args: []

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CREATING Device configuration snippets                                           │
└──────────────────────────────────────────────────────────────────────────────────┘
[INFO]    Rendered normalize template for sw1 into sw1/normalize
[INFO]    Rendered initial template for sw1 into sw1/initial
[INFO]    Rendered vlan template for sw1 into sw1/vlan
[INFO]    Rendered ospf template for sw1 into sw1/ospf

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CHECKING Are lab devices ready to be configured?                                 │
└──────────────────────────────────────────────────────────────────────────────────┘
run_command executing: ['bash', '-c', 'command -v sshpass']
Adding /home/sa/code/quick-netlab-lab/netlab to system PATH
New system path: /home/sa/code/quick-netlab-lab/netlab:/home/sa/code/netsim-main-lab/venv/bin:/home/sa/.vscode-server/data/User/globalStorage/github.copilot-chat/debugCommand:/home/sa/.vscode-server/data/User/globalStorage/github.copilot-chat/copilotCli:/home/sa/.vscode-server/cli/servers/Stable-c9d77990917f3102ada88be140d28b038d1dd7c7/server/bin/remote-cli:/home/sa/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/sa/.vscode-server/extensions/ms-python.debugpy-2025.18.0/bundled/scripts/noConfigScripts
... run result: CompletedProcess(args=['bash', '-c', 'command -v sshpass'], returncode=0, stdout='/usr/bin/sshpass\n', stderr='')
[INFO]    Checking SSH server(s) on sw1
[SSH]     SSH server on node sw1 (device eos) is ready after 0.3 seconds

┌──────────────────────────────────────────────────────────────────────────────────┐
│ CONFIG Deploying device configurations                                           │
└──────────────────────────────────────────────────────────────────────────────────┘
[INFO]    Starting deployment thread for sw1 to deploy normalize,initial,vlan,ospf
[INFO]    Executing normalize configuration for node sw1
run_command executing: docker exec clab-ml-4-sw1 /mnt/flash/01-normalize.sh
... run result: CompletedProcess(args=['docker', 'exec', 'clab-ml-4-sw1', '/mnt/flash/01-normalize.sh'], returncode=1, stdout='\n>  platform tfa phy control-frame disabled\n\n% Invalid input at line 5\n\n>  platform tfa phy control-frame disabled\n\n% Invalid input at line 9\n\n>  platform tfa phy control-frame disabled\n\n% Invalid input at line 13\n', stderr='')
  >  platform tfa phy control-frame disabled

  % Invalid input at line 5

  >  platform tfa phy control-frame disabled

  % Invalid input at line 9

  >  platform tfa phy control-frame disabled

  % Invalid input at line 13
[FATAL]   initial: normalize configuration in namespace clab-ml-4-sw1 failed for node sw1
[DATA]    Executed command: docker exec clab-ml-4-sw1 /mnt/flash/01-normalize.sh
Results of configuration script deployments
===========================================================================================================================================================================================
sw1                                  Failed: normalize

[FATAL]   initial: Configuration deployment failed

This is with this basic topology file (including the sh mode for Arista):

---
plugin: [ multilab ]
defaults.multilab.id: 4 # subnet will be 10.194.59.0/24
defaults.addressing.mgmt.start: 199

defaults.devices.eos.clab.group_vars.netlab_config_mode: sh
defaults.devices.eos.clab.image: "ceos:4.29.9.1M"
defaults.devices.linux.clab.image: "ubuntu/nginx"
provider: clab

groups:
  switches:
    _auto_create: true
    device: eos
    module: [ ospf, vlan ]
    members: [ sw1, sw2 ]

  allhosts:
    _auto_create: true
    device: linux
    provider: clab
    role: host
    members: [ h11, h12, h21, h22 ]
    config: [ config-snippets/linux.j2 ]

nodes:
  sw1:
    vlans:
      user_1:
        ipv4: 1
      server_1:
        ipv4: 1
  sw2:
    vlans:
      user_2:
        ipv4: 1
      server_2:
        ipv4: 1

vlans:
  user_1:
    id: 11
    ospf.passive: true
  server_1:
    id: 12
    ospf.passive: true
  user_2:
    id: 21
    ospf.passive: true
  server_2:
    id: 22
    ospf.passive: true

links:
  - sw1:
    sw2:

  - h11:
      ipv4: 11
    sw1:
      vlan.access: user_1
  - h12:
      ipv4: 11
    sw1:
      vlan.access: server_1
  - h21:
      ipv4: 11
    sw2:
      vlan.access: user_2
  - h22:
      ipv4: 11
    sw2:
      vlan.access: server_2

And the topology file works well if I remove the line: defaults.devices.eos.clab.group_vars.netlab_config_mode: sh

ipspace · 2026-02-01T06:07:58Z

Thanks a million for the report. Looks like your cEOS version does not recognize that command. Which version are you using?

ipspace · 2026-02-01T06:09:32Z

Thanks a million for the report. Looks like your cEOS version does not recognize that command. Which version are you using?

Forget it, it's in the topology file. Obviously my hack doesn't work with older cEOS versions. Back to the drawing board (or maybe I'd just document the caveat)

sdargoeuves · 2026-02-01T08:40:33Z

Ah! It crossed my mind, but I didn't try. Time for me to upgrade that version, i had a reason for using this one and not a more recent one, but i can't remember why!
I would say documenting the caveat is fine, but if the error message on failure could be more explicit that would be better.

ipspace · 2026-02-01T09:21:08Z

Ah! It crossed my mind, but I didn't try. Time for me to upgrade that version, i had a reason for using this one and not a more recent one

No worries, I already have a fix (and it's a better kludge than the current one). Will add you as the PR reviewer

ipspace requested a review from Copilot January 29, 2026 16:44

Copilot started reviewing on behalf of ipspace January 29, 2026 16:44 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

ipspace requested review from DanPartelly, ddutt, sdargoeuves and ssasso January 29, 2026 16:53

ipspace force-pushed the ssh-refactor branch from 2c35b1e to a7facb9 Compare January 30, 2026 09:47

ipspace added a commit that referenced this pull request Jan 30, 2026

Integration tests for #3049

8661c4c

ipspace mentioned this pull request Jan 30, 2026

Use Linux scripts to configure Arista cEOS containers #3051

Merged

ipspace merged commit 063cf09 into dev Jan 31, 2026
13 checks passed

ipspace deleted the ssh-refactor branch January 31, 2026 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate internal SSH readiness checks with Ansible checks#3049

Integrate internal SSH readiness checks with Ansible checks#3049
ipspace merged 1 commit intodevfrom
ssh-refactor

ipspace commented Jan 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

ipspace commented Jan 29, 2026

Uh oh!

Uh oh!

sdargoeuves commented Feb 1, 2026 •

edited

Loading

Uh oh!

ipspace commented Feb 1, 2026

Uh oh!

ipspace commented Feb 1, 2026

Uh oh!

sdargoeuves commented Feb 1, 2026

Uh oh!

ipspace commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ipspace commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ipspace commented Jan 29, 2026

Uh oh!

Uh oh!

sdargoeuves commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ipspace commented Feb 1, 2026

Uh oh!

ipspace commented Feb 1, 2026

Uh oh!

sdargoeuves commented Feb 1, 2026

Uh oh!

ipspace commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ipspace commented Jan 29, 2026 •

edited

Loading

sdargoeuves commented Feb 1, 2026 •

edited

Loading