feat(kmsgLogWatcher): Add option to restart kmsg parser if channel closes#1192
feat(kmsgLogWatcher): Add option to restart kmsg parser if channel closes#1192arjraman wants to merge 2 commits intokubernetes:masterfrom
Conversation
|
Welcome @arjraman! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: arjraman The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @arjraman. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
requesting review feedback @Random-Liu @yujuhong. cc @hakman @arjraman |
Removed log message for disabled restart on error.
|
@hakman can we please get review feedback on this? Thank you. |
|
i wrote this feature last year and it never got reviewed / approved. this has been in use in my org for a year now. its saved us many many times 👍 |
hakman
left a comment
There was a problem hiding this comment.
Sorry for the delay, had some spare time and got to the review, at least a first pass. Thanks for your patience.
|
|
||
| // RestartOnErrorKey is the configuration key to enable restarting | ||
| // the kmsg parser when the channel closes due to an error. | ||
| RestartOnErrorKey = "restartOnError" |
There was a problem hiding this comment.
The linked issues are asking for recovery after Kmsg channel is closed, not for an opt-in knob. I’d suggest making the restart path unconditional.
| case <-time.After(retryDelay): | ||
| } | ||
|
|
||
| parser, err := kmsgparser.NewParser() | ||
| if err != nil { | ||
| klog.Errorf("Failed to create new kmsg parser, retrying in %v: %v", retryDelay, err) | ||
| continue | ||
| } | ||
|
|
||
| k.kmsgParser = parser | ||
| klog.Infof("Successfully restarted kmsg parser") | ||
| return parser.Parse(), true |
There was a problem hiding this comment.
I’d simplify the retry loop so it attempts parser recreation immediately and only waits between failures.
| case <-time.After(retryDelay): | |
| } | |
| parser, err := kmsgparser.NewParser() | |
| if err != nil { | |
| klog.Errorf("Failed to create new kmsg parser, retrying in %v: %v", retryDelay, err) | |
| continue | |
| } | |
| k.kmsgParser = parser | |
| klog.Infof("Successfully restarted kmsg parser") | |
| return parser.Parse(), true | |
| default: | |
| } | |
| parser, err := kmsgparser.NewParser() | |
| if err == nil { | |
| k.kmsgParser = parser | |
| klog.Infof("Successfully restarted kmsg parser") | |
| return parser.Parse(), true | |
| } | |
| klog.Errorf("Failed to create new kmsg parser, retrying in %v: %v", retryDelay, err) | |
| select { | |
| case <-k.tomb.Stopping(): | |
| klog.Infof("Stop watching kernel log during restart attempt") | |
| return nil, false | |
| case <-time.After(retryDelay): | |
| } |
|
|
||
| "github.com/euank/go-kmsg-parser/kmsgparser" | ||
| "k8s.io/klog/v2" | ||
| klog "k8s.io/klog/v2" |
There was a problem hiding this comment.
Nit: not needed
| klog "k8s.io/klog/v2" | |
| "k8s.io/klog/v2" |
This PR adds an option to restart kmsg parser if channel closes.
When /dev/kmsg gets flooded with messages, the kmsg file descriptor can close unexpectedly. Instead of exiting the watcher and restarting the whole pod, I would like to restart the kmsg parser.
Related issues: #1002, #1003, #1004, #1168, #1174
cc @hakman