Modify URM Test Runner by kartnema · Pull Request #459 · qualcomm-linux/qcom-linux-testkit

kartnema · 2026-05-14T07:42:43Z

Add support to delete any stale lock files created in previous runs
Remove log-based result mapping (parse_and_score_log) and solely rely on the return code (0 or 1). The URM Test Runner already handles the return code, which the script can rely on. Judging pass / fail based on the logs results in some false positives.
Modify test skip logic, a suite should only be skipped if it is not in the approved list, of if the required configs are not found, or if the test executable itself is absent. Skipping of one or more tests part of the testsuite, should not result in an overall SKIP qualification, it should be either PASS OR FAIL.

smuppand · 2026-05-14T16:58:58Z

 # ---------- Lock (avoid concurrent runs on same host) ----------
 LOCKFILE="/tmp/${TESTNAME}.lock"
+# Delete stale lock file, if it exists
+rm "${LOCKFILE}"


Do not remove the lock file before flock. With flock, stale lock files are harmless because the lock is tied to the open file descriptor, not just the path. Keep the file and let flock -n decide whether another run is active. If stale directory cleanup is required for the no-flock fallback, handle only the directory fallback and only when it is provably stale. Do not remove the flock file before acquiring the lock.

The issue is that even while attempting fresh runs (i.e. flashing a new image and then running the tests through Axiom) we are running into the error that: "another userspace resource manager run" is active. Which should not be happening, since this is the first run.

Also, as mentioned, each time a CI/CD job is triggered a new image (from qli-artifacts) will be flashed on the device (and we won't be using this device directly manually), hence I do not see much use for this lock-protection code in general.

@smuppand even we tried this as well.

if command -v flock >/dev/null 2>&1; then
exec 9>"$LOCKFILE"
if ! flock -n 9; then
log_warn "Another $ run is active; skipping"
echo "$TESTNAME SKIP" >"$RES_FILE"
exit 0
fi
trap 'exec 9>&-' EXIT INT TERM [use to release the lock]

still not working.

@smuppand even we tried this as well.

if command -v flock >/dev/null 2>&1; then exec 9>"$LOCKFILE" if ! flock -n 9; then log_warn "Another $ run is active; skipping" echo "$TESTNAME SKIP" >"$RES_FILE" exit 0 fi trap 'exec 9>&-' EXIT INT TERM [use to release the lock]

still not working.

I agree the symptom is valid: on a fresh flashed image / first Axiom run, we should not get “another userspace resource manager run is active”.

But I do not think removing the flock file before acquiring the lock is safe. With flock, the lock is tied to the open file/inode. If another process is actively holding a lock and we remove the path, a second process can recreate the path and acquire a lock on a different inode, allowing concurrent URM runs.

For the flock path, stale lock files are harmless and should not be removed before locking.

A safer approach would be:

Use flock normally without deleting the lock file.

For the no-flock fallback, use a separate lock directory, for example:
/tmp/userspace-resource-manager.lockdir

If we really need stale cleanup, clean only the fallback lock directory and only when it is provably stale, for example when no matching URM runner process is active.

So I would suggest:

remove rm "$LOCKFILE" from the flock path

add debug when flock fails, such as ps output for URM/run.sh processes

optionally use a separate fallback lockdir for the mkdir path

This keeps concurrency protection intact while still helping debug the false “active run” case.

LOCKFILE="/tmp/${TESTNAME}.lock" LOCKDIR="/tmp/${TESTNAME}.lockdir" if command -v flock >/dev/null 2>&1; then exec 9>"$LOCKFILE" if ! flock -n 9; then log_warn "Another ${TESTNAME} run is active; skipping" log_info "Active URM-related processes:" ps 2>/dev/null | grep -E 'userspace-resource-manager|Urm(Component|Integration)Tests|[run.sh](http://run.sh/)' | grep -v grep || true echo "$TESTNAME SKIP" >"$RES_FILE" exit 0 fi else if ! mkdir "$LOCKDIR" 2>/dev/null; then log_warn "Another ${TESTNAME} run is active or stale fallback lockdir exists: $LOCKDIR" echo "$TESTNAME SKIP" >"$RES_FILE" exit 0 fi trap 'rmdir "$LOCKDIR" 2>/dev/null || true' EXIT INT TERM fi

smuppand · 2026-05-14T17:01:26Z

-    fi
-
-    case $score in
+    case $rc in


Only return codes 0 and 1 are handled. Any other non-zero return code, such as:

124 from timeout

126 command found but not executable

127 command not found

signal-based exits

will not be explicitly handled. Depending on shell behavior, this can cause run_one() to return success or an unintended status, leading to wrong PASS/FAIL accounting.

The URM test framework handles most of the exceptions internally (through exception handling) and returns an aggregate status code (https://github.com/qualcomm/userspace-resource-manager/blob/main/tests/Utils/URMTests.cpp#L123), which will always be 0 or 1. I wanted to map the script behavior more closely with that of the URM Test Framework.

Also, since we are checking whether the executable exists (and whether it is executable) earlier in the script already, hence those issues won't factor in later.

In any case, i'll add a default (wildcard) to the shell script switching logic as a fallaback.

The URM test framework handles most of the exceptions internally (through exception handling) and returns an aggregate status code (https://github.com/qualcomm/userspace-resource-manager/blob/main/tests/Utils/URMTests.cpp#L123), which will always be 0 or 1. I wanted to map the script behavior more closely with that of the URM Test Framework.

Also, since we are checking whether the executable exists (and whether it is executable) earlier in the script already, hence those issues won't factor in later.

In any case, i'll add a default (wildcard) to the shell script switching logic as a fallaback.

Thanks, that makes sense. If the URM framework guarantees 0/1 for framework-handled test failures, relying on rc is fine and is cleaner than parsing logs.

I would still keep the wildcard fallback in the shell case statement, because failures outside the URM framework can still return other codes, for example timeout wrapper rc=124, shell execution failures, or signal exits. Treating those as FAIL is safer than leaving the case unmatched.

Added fallback (*)

smuppand · 2026-05-14T17:05:24Z

@kartnema Please include an appropriate commit message and sign off on the commit to resolve the CI failures.

smuppand

Shell Lint is still failing, so please fix the reported lint issue from the latest workflow.

smuppand · 2026-05-18T10:42:51Z

  if ! flock -n 9; then
    log_warn "Another ${TESTNAME} run is active; skipping"
+    log_info "Active URM-related processes:"
+    ps 2>/dev/null | grep -E 'userspace-resource-manager|Urm(Component|Integration)Tests|[run.sh](http://run.sh/)' | grep -v grep || true


The debug ps grep pattern includes GitHub-rendered markdown.

Please replace it with a plain regex, for example:
ps 2>/dev/null | grep -E 'userspace-resource-manager|Urm(Component|Integration)Tests|run\.sh' | grep -v grep || true

I see the check is still failing and suggesting to use pgrep

Can we use something like:

kartnema@hu-kartnema-hyd:/local/mnt/workspace/kartnema/images/meta-qcom/recipes-support/userspace-resource-manager/files/src/qcom-linux-testkit/Runner/suites/Performance/userspace-resource-manager$ pgrep -af 'userspace-resource-manager|Urm(Component|Integration)Tests|run.sh' || true
1168298 UrmComponentTests
kartnema@hu-kartnema-hyd:/local/mnt/workspace/kartnema/images/meta-qcom/recipes-support/userspace-resource-manager/files/src/qcom-linux-testkit/Runner/suites/Performance/userspace-resource-manager$

Does it meet the intended requirements.

smuppand · 2026-05-18T10:43:59Z

The fallback lock cleanup uses the wrong path:
LOCKDIR="/tmp/${TESTNAME}.lockdir"
but trap still does:
rmdir "$LOCKFILE"

Please change it to:
trap 'rmdir "$LOCKDIR" 2>/dev/null || true' EXIT INT TERM

Thanks for pointing this out

smuppand · 2026-05-18T17:23:03Z

  if ! flock -n 9; then
    log_warn "Another ${TESTNAME} run is active; skipping"
+    log_info "Active URM-related processes:"
+    pgrep -af 'userspace-resource-manager|Urm(Component|Integration)Tests|run.sh' || true


pgrep is used before the dependency check and is not listed in check_dependencies

Moved check_dependencies above in the script, and added pgrep to the list

- Add logs to debug flock related concurrency issues, preventing URM test runs from going through. - Remove log-based result mapping (parse_and_score_log) and solely rely on the return code (0 or 1). The URM Test Runner already handles the return code, which the script can rely on. Judging pass / fail based on the logs results in some false positives. - Modify test skip logic, a suite should only be skipped if it is not in the approved list, of if the required configs are not found, or if the test executable itself is absent (or is not marked executable). Skipping of one or more tests part of the testsuite, should not result in an overall SKIP qualification, it should be either PASS OR FAIL. Signed-off-by: Kartik Nema <kartnema@qti.qualcomm.com>

smuppand

LGTM

kartnema force-pushed the improve-urm-test-runner branch from 5717167 to fc0863e Compare May 14, 2026 07:44

smuppand requested changes May 14, 2026

View reviewed changes

kartnema force-pushed the improve-urm-test-runner branch 3 times, most recently from b6f27ec to 99e7475 Compare May 18, 2026 08:58

kartnema requested a review from smuppand May 18, 2026 09:51

smuppand requested changes May 18, 2026

View reviewed changes

kartnema force-pushed the improve-urm-test-runner branch 2 times, most recently from 429f45e to 7efe7b4 Compare May 18, 2026 12:54

kartnema requested a review from smuppand May 18, 2026 12:55

smuppand requested changes May 18, 2026

View reviewed changes

kartnema force-pushed the improve-urm-test-runner branch from 7efe7b4 to 3d34f53 Compare May 18, 2026 17:39

kartnema requested a review from smuppand May 18, 2026 17:41

smuppand approved these changes May 18, 2026

View reviewed changes

smuppand merged commit 2c5f843 into qualcomm-linux:main May 18, 2026
12 checks passed

Conversation

kartnema commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

varunsinghal29 May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kartnema May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smuppand commented May 14, 2026

Uh oh!

smuppand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kartnema May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smuppand left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kartnema commented May 14, 2026 •

edited

Loading

varunsinghal29 May 15, 2026 •

edited

Loading

kartnema May 15, 2026 •

edited

Loading

kartnema May 18, 2026 •

edited

Loading