HIVE-27126: queue level resource stats for YARN RM.#6501
Conversation
ea3b1b3 to
5f4c748
Compare
9e97b27 to
e2d3173
Compare
e2d3173 to
fa844c0
Compare
fa844c0 to
dac58c2
Compare
dac58c2 to
f4301f1
Compare
f4301f1 to
ada32ce
Compare
8720321 to
827ee45
Compare
827ee45 to
120c749
Compare
120c749 to
34febff
Compare
34febff to
47b3abe
Compare
|
abstractdog
left a comment
There was a problem hiding this comment.
thanks @architjainjain so far, dropped some comments, but I wasn't able to fully read it through, I'll get back after, in the meantime I can do some testing too hopefully
| * behaviour added as part of HIVE-27126. | ||
| * | ||
| * We capture stdout via a ByteArrayOutputStream and inspect the rendered output. | ||
| */public class TestInPlaceUpdate { |
There was a problem hiding this comment.
line break before public
| new String(new char[InPlaceUpdate.MIN_TERMINAL_WIDTH]).replace("\0", "-"); | ||
|
|
||
| /** Counts non-overlapping occurrences of {@code needle} in {@code haystack}. */ | ||
| private static int countOccurrences(String haystack, String needle) { |
There was a problem hiding this comment.
what about reusing commons-lang3 StringUtils.countMatches
| * Unit tests for InPlaceUpdate — focusing on the queue-metrics separator | ||
| * behaviour added as part of HIVE-27126. |
There was a problem hiding this comment.
TestInPlaceUpdate sounds like a generic unit test class for InPlaceUpdate, so this comment is supposed to get outdated over time
| /** Minimal ProgressMonitor stub — returns empty headers/rows/footer. */ | ||
| private static ProgressMonitor makeMonitor(String queueMetrics) { | ||
| return new ProgressMonitor() { | ||
| @Override public List<String> headers() { |
There was a problem hiding this comment.
public in new line, pls take care of the same below too
|
|
||
|
|
||
|
|
||
|
|
||
|
|
| for (int i = 0; i < 5; i++) { | ||
| collectors[i] = new YarnQueueMetricsCollector( | ||
| mockYarnClient, "default", refreshIntervalMs, "jitter-test-query-" + i); | ||
| assertNotNull("Collector " + i + " should be created successfully", collectors[i]); | ||
| } | ||
| // If we get here, all 5 collectors were created with their own jittered delays | ||
| // without conflict or exception - thundering herd fix is in place |
There was a problem hiding this comment.
considering mock yarn clients, creating concurrent collectors for jitter testing doesn't seem that valuable to me...I don't think they will ever throw an exception
| mockYarnClient, "default", 30, "circuit-breaker-recovery-query"); | ||
|
|
||
| try { | ||
| // Wait for recovery - snapshot should eventually be populated |
There was a problem hiding this comment.
need extra assertion about the failure first to make 100% sure we actually hit circuit breaker logic
| public void testCircuitBreakerDoesNotAffectSuccessfulCollection() throws Exception { | ||
| // Normal operation - no failures, circuit breaker should never activate |
There was a problem hiding this comment.
this is just a simple happy testing, I guess it's covered by testSuccessfulMetricsCollection
There was a problem hiding this comment.
@abstractdog Thanks for catching this! I've added JavaDoc to clarify that this test is part of the circuit breaker test suite. While it looks similar to testSuccessfulMetricsCollection(), it specifically validates that the circuit breaker doesn't interfere with normal operations. The circuit breaker suite needs three scenarios: failure, recovery, and happy path. This test covers the happy path behavior for circuit breaker specifically, not general metrics collection. I've added cross-references to the related tests for clarity.
| when(mockQueueStats.getAllocatedMemoryMB()).thenReturn(1024L); | ||
| when(mockQueueStats.getAvailableMemoryMB()).thenReturn(1024L); | ||
| when(mockQueueStats.getAllocatedVCores()).thenReturn(4L); | ||
| when(mockQueueStats.getAvailableVCores()).thenReturn(4L); | ||
| when(mockQueueStats.getNumAppsRunning()).thenReturn(1L); | ||
| when(mockQueueStats.getPendingContainers()).thenReturn(0L); | ||
| when(mockQueueInfo.getQueueStatistics()).thenReturn(mockQueueStats); | ||
| when(mockQueueInfo.getCapacity()).thenReturn(0.5f); | ||
| when(mockYarnClient.getQueueInfo("default")).thenReturn(mockQueueInfo); |
There was a problem hiding this comment.
I feel like defining all these variables adds a lot of boilerplate to the unit tests, whereas in this case we're only interested in collection timestamp, could a separate method be created just for adding happy value, and it can be overridden per test, like:
when(mockQueueStats.getAllocatedMemoryMB()).thenReturn(1024L);
when(mockQueueStats.getAvailableMemoryMB()).thenReturn(1024L);
when(mockQueueStats.getAllocatedVCores()).thenReturn(4L);
when(mockQueueStats.getAvailableVCores()).thenReturn(4L);
when(mockQueueStats.getNumAppsRunning()).thenReturn(1L);
when(mockQueueStats.getPendingContainers()).thenReturn(0L);
when(mockQueueInfo.getQueueStatistics()).thenReturn(mockQueueStats);
when(mockQueueInfo.getCapacity()).thenReturn(0.5f);
when(mockYarnClient.getQueueInfo("default")).thenReturn(mockQueueInfo);
method javadoc can also tell that these are only default happy values for the unit tests
| // HIVE-27126: Thrift regeneration omitted setStartTimeIsSet(true) from the constructor. | ||
| // Explicitly call setStartTime() to set the isset flag required for Thrift validation. | ||
| tProgressUpdateResp.setStartTime(progressUpdate.startTimeMillis); | ||
| if (progressUpdate.queueMetrics() != null && !progressUpdate.queueMetrics().isEmpty()) { | ||
| tProgressUpdateResp.setQueueMetrics(progressUpdate.queueMetrics()); | ||
| } | ||
| resp.setProgressUpdateResponse(tProgressUpdateResp); |
There was a problem hiding this comment.
why is this change needed?
There was a problem hiding this comment.
This works around a Thrift code generation bug. When we regenerated Thrift code after adding the queueMetrics field, the generated constructor for TProgressUpdateResp accepts startTimeMillis but fails to set the isset flag. Without this flag, Thrift serialization treats the field as unset, causing clients to receive incomplete progress updates. The explicit setStartTime() call ensures both the value AND the isset flag are properly set, maintaining backward compatibility with Thrift clients.
without this test the generated file is not having the starttime isset flag updated.
| this.status = status; | ||
| this.footerSummary = footerSummary; | ||
| this.startTime = startTime; | ||
| setStartTimeIsSet(true); |
There was a problem hiding this comment.
@abstractdog this is getting removed. so we added manually after the constructor call.
47b3abe to
e7a0726
Compare



What changes were proposed in this pull request?
Display Yarn resources availability in real time along with Tez job progress
Why are the changes needed?
Right we don't know if the queue used in Hive query execution is having enough resources or not. We are already displaying tez job details like number of task per each vertex and how are they progressing.
If the resources available are not good enough to execute the query in parallel or query that use to take shorter time is taking time, then we can use this new detail to understand that queue is busy and resources are not enough
Does this PR introduce any user-facing change?
yes, it shows queue level metrics.
How was this patch tested?
the patch was tested by creating yarn cluster and running queries, with different queues.

after disabling the queue metrics:

set hive.tez.queue.metrics.refresh.interval=0s;