bug: update cs2 for demo renders, fail-fast when it cannot launch#26
Open
Flegma wants to merge 2 commits into
Open
bug: update cs2 for demo renders, fail-fast when it cannot launch#26Flegma wants to merge 2 commits into
Flegma wants to merge 2 commits into
Conversation
Demo renders skipped the CS2 update to stay pinned to the game-server
build. After a Valve patch the cached build goes stale and Steam refuses
to launch it ("Update required"), so cs2 never starts and the render
hangs forever (the launch wait is deliberately no-timeout for cold
shader compiles).
- install_cs2_via_steamcmd: keep the live pin (DEMO_URL empty), but for
demo renders run a fast app_update 730 (no validate) so CS2 is current
and launchable. Re-introduces the reverted "always update" (47bb304),
now gated to demos so it does not break live build-pinning.
- wait_for_cs2_process: if cs2 never starts and no shader compile or
game-file validation is running, die after CS2_LAUNCH_TIMEOUT (360s),
which marks every batch job errored in the UI and frees the node.
7e9f6f4 to
495754a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CS2 demo renders could hang forever with no UI error. Two causes:
wait_for_cs2_processwaits indefinitely on purpose (so a cold shader compile is never killed). So when cs2 cannot launch for any reason, the render sits at "waiting on cs2" forever and pins the only GPU node, with nothing shown in the UI.Changes (
src/lib/steam.sh)1. Keep CS2 current for demo renders (
install_cs2_via_steamcmd).It previously skipped steamcmd entirely when CS2 was already installed (to pin live spectate to the game-server build). Now:
DEMO_URLempty): unchanged, skip and stay pinned.DEMO_URLset): run a faststeamcmd +app_update 730(novalidate, a no-op when already current) so CS2 is on the current build and launchable, thenregister_library.Demos have no live server to match, so the pin does not apply to them. This re-introduces the "always update" approach tried in
47bb304and reverted inb5c9fe6(it broke live build-pinning), now gated to demos so the revert's reason no longer applies.2. Fail fast when cs2 cannot launch (
wait_for_cs2_process).The launch wait now bails when cs2 genuinely will not start: if no shader compile was ever seen AND no game-file validation is running AND
CS2_LAUNCH_TIMEOUT(default 360s) has elapsed, it callsdie, which broadcastsstatus=error(with the reason) to every batch job so the failure shows in the render queue UI, and exits so the pod is reaped and the GPU node frees.The guards (
shaders_seen=0,validating_active=0) ensure a long-but-legitimate cold shader compile or integrity check is never aborted, and the 360s default clears the slow Steam first-boot (cs2 can spawn ~2 min in). Overridable viaCS2_LAUNCH_TIMEOUT.Together: fix 1 makes demo renders launch in the common case, and fix 2 surfaces a clear UI error and frees the node if a render still cannot launch for any reason.
Validation
bash -nclean.DEMO_URLreliably distinguishes demo vs live in this env;die/broadcast_batch_errorare in scope in the batch flow; the timeout cannot fire during a compile or validation and clears the slow first-boot; thepgrepearly-return still takes precedence. The demo-gating directly addresses why47bb304was reverted.