sof-kernel-log-check.sh: Ignore more logs#1318
Conversation
|
Can one of the admins verify this patch?
|
7ded2c2 to
f1f30a2
Compare
tools/sof-kernel-log-check.sh
Outdated
| # ignore the ACPI error on LNL and PTL. | ||
| # kernel: ACPI: \: Can't tag data node | ||
| ignore_str="$ignore_str""|kernel: ACPI: \\\\: Can't tag data node" | ||
| ignore_str="$ignore_str""|kernel: xe 0000:00:02.0: \[drm\] \*ERROR\* Tile0: GT1: Timed out wait for G2H, fence 669, action 5503, done no" |
There was a problem hiding this comment.
Xe dev here, these specific numbers are seen frequently (669 and 5503)?
There was a problem hiding this comment.
I ask because outside of very specific circumstances (probably only module load) these shouldn't be repeatable.
There was a problem hiding this comment.
Not all but many of the other GPU and display errors in this file are seen only once at boot. So maybe this one too, which makes it repeatable? Dunno.
A good suspend/resume pass rate is always achieved last (see #1038 + internal sof-framework 408 and others) and by that time other components tend to be more reliable and less noisy.
There was a problem hiding this comment.
Looking and talking with my colleagues, maybe this could be reproducible. How often are you seeing this come up?
There was a problem hiding this comment.
@msatwood This problem appeared on PTL during sof v2.14 validation on kernel revision b250b5425a17. We haven't encountered these specific values anywhere else, but we did notice a similar error on WCL:
[70727.555869] kernel: xe 0000:00:02.0: [drm] Tile0: GT1: { key 0x0002 : 64b value 0xfec00000 } # ggtt_size
[70727.555875] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: PF: Failed to push self configuration (-ECANCELED)
There was a problem hiding this comment.
I just noticed an error from today's run and saw another instance of this problem on PTL:
[ 3595.491532] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: Timed out wait for G2H, fence 3275, action 5503, done no
[ 3595.491620] kernel: xe 0000:00:02.0: [drm] *ERROR* Tile0: GT1: PF: Failed to push self configuration (-ETIME)
This time the numbers are different. I should probably change the line so that it matches all numbers in this message.
Ignore further errors unrelated to SOF. Signed-off-by: Pawel Langowski <pawelx.langowski@intel.com>
f1f30a2 to
002e8a2
Compare
|
This resolve failing SOF tests, this error message is not directly connected with fw, BUT it should be considered as temporary solution. We need to plan how to handle and report these errors in future. |
Ignore further errors unrelated to SOF.