Skip to content

Scheduler Improvement: Allow merging new actions to existing plan#7702

Open
jujokini wants to merge 2 commits into
OpenNebula:one-7.0from
TheQtCompanyRnD:scheduler_merge_plan
Open

Scheduler Improvement: Allow merging new actions to existing plan#7702
jujokini wants to merge 2 commits into
OpenNebula:one-7.0from
TheQtCompanyRnD:scheduler_merge_plan

Conversation

@jujokini
Copy link
Copy Markdown

Description

We had a problem with OpenNebula 7.0.1 and our custom drivers. The VMs stay a long time in boot phase due to the VM image download. While VMs stay in boot status, they block the scheduling as the scheduler waits for one plan to finish before starting a new one. The fix for is to allow merging new scheduling actions into the existing plan.

There is also a related bug report #7639

Branches to which this PR applies

  • master
  • one-7.2
  • one-7.0

  • Check this if this PR should not be squashed

jujokini and others added 2 commits May 21, 2026 14:14
…arding

Previously, if a placement plan was already APPLYING when a new scheduling
cycle produced results, the new plan was silently discarded. This caused
VMs scheduled in subsequent cycles to wait until the running plan fully
completed before being deployed.

This change implements plan merging for placement plans (cid == -1):
- When a new plan arrives and one is already APPLYING, merge_actions() is
  called instead of returning early
- merge_actions() first prunes terminal actions (DONE/ERROR/TIMEOUT) from
  the running plan to prevent unbounded growth and unblock check_completed()
- New actions for VMs not already in the plan are appended with IDs
  starting above the current maximum to avoid colliding with in-flight
  APPLYING action IDs stored in VM history records
- execute_plans() is called immediately after merging so newly appended
  actions are dispatched without waiting for the next timer tick

DRS cluster optimization plans (cid >= 0) retain the existing replace
behaviour as concurrent optimizer runs would produce conflicting results.

Also removes two stale TODO comments in SchedulerManager.cc that referred
to this missing guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Jukka Jokiniva <jukka.jokiniva@qt.io>
Previously, execute_plan() stopped dispatching as soon as the first
READY action hit either the per-host or per-cluster action limit.
This meant VMs targeting uncongested hosts were skipped until the
next cycle.

Replace get_next_action() with get_ready_actions() which collects all
READY action pointers upfront. The dispatch loop now breaks only on
the cluster cap (hard ceiling) and continues past saturated hosts,
allowing VMs assigned to other hosts to be started in the same cycle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Jukka Jokiniva <jukka.jokiniva@qt.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant