[Observability] Emit metric ClustermgtdHeartbeat to signal clustermgtd heartbeat.#685
Merged
gmarciani merged 1 commit intoaws:developfrom Jan 27, 2026
Conversation
8b3ee75 to
b9d50bb
Compare
9983ec3 to
af6495c
Compare
af6495c to
165c3c1
Compare
Comment on lines
+591
to
+593
| # Publish heartbeat metric to CloudWatch | ||
| self._metrics_publisher.put_metric(metric_name=CW_METRICS_HEARTBEAT, value=1) | ||
|
|
Contributor
There was a problem hiding this comment.
How about surrounding this with a try catch and adding a warning log line? This is not a critical cluster management logic. We should avoid it throwing an Exception out of the function.
Contributor
Author
There was a problem hiding this comment.
I agree this is done inside the metrics publisher itself.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of changes
Emit metric
ClustermgtdHeartbeatto signal clustermgtd heartbeat.The metric is emitted with dimensions: ClusterName and InstanceId.
The metric is intentionally emitted at the end of the clustermgtd loop to represent the real health of the daemon.
If it was emitted at the beginning of the iteration, it would be open to false negatives.
This PR depends on the permissions added in aws/aws-parallelcluster#7209
Tests
References
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.