HDDS-13192. Adding pending delete bytes info in storage report of DN#8755
HDDS-13192. Adding pending delete bytes info in storage report of DN#8755priyeshkaratha wants to merge 1 commit intoapache:HDDS-13177from
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces tracking and reporting of bytes pending deletion at the container and datanode levels. It updates container metadata handling, deletion command logic, storage reports, metrics, APIs, and relevant tests across Recon, SCM, and container-service modules.
- Adds
pendingDeletionBytescounters in container data and integrates them into delete command handlers. - Extends Datanode and storage‐location reports, SCM heartbeats, Recon endpoints, and metrics to include pending‐deletion bytes.
- Implements a short-lived cache in
HddsVolumefor aggregated pending‐deletion bytes and updates unit tests accordingly.
Reviewed Changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| recon/src/main/java/.../DatanodeStorageReport.java | Added pendingDeletions field, constructor arg, and getter. |
| recon/src/main/java/.../NodeEndpoint.java & ClusterStateEndpoint.java | Propagate new field into storage‐report construction. |
| recon/src/test/java/.../TestNSSummaryEndpoint*.java | Updated mocks to pass extra constructor parameter. |
| server-scm/.../SCMNodeStat.java & SCMNodeMetric.java & NodeStat.java | Added pendingDeletions metric in stat/metric APIs. |
| server-scm/.../Test*.java | Updated tests to account for new stat constructor args. |
| container-service/src/.../KeyValueContainerData.java | Added methods for pending deletion bytes. |
| container-service/src/.../KeyValueContainerUtil.java | Compute blockPendingDeletionBytes on metadata population. |
| container-service/src/.../HddsVolume.java | Cached aggregation of pending‐deletion bytes with TTL. |
| container-service/src/.../DeleteBlocksCommandHandler.java | Increment pending‐deletion bytes when marking blocks. |
| container-service/src/.../BlockDeletingService.java & Metrics | Track and expose total pending‐deletion bytes in metrics. |
| common/src/main/java/.../OzoneConsts.java | Added constant for metadata key. |
Comments suppressed due to low confidence (3)
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/ContainerUtils.java:343
- The exception message refers to
getPendingDeletionBlocksbut the method isgetPendingDeletionBytes. Update the message for consistency.
" not support.");
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java:324
- It would be beneficial to add a unit test for
getPendingDeletionBytes(), verifying both correct aggregation and cache-expiration behavior.
public long getPendingDeletionBytes() {
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainer.java:1040
- [nitpick] The inline comment was accidentally concatenated (
HddsVolumetestImportV2ReplicaToV3HddsVolume). Please revert it to// create HddsVolumeor clarify.
// create HddsVolumetestImportV2ReplicaToV3HddsVolume
483aa16 to
fbb66ee
Compare
errose28
left a comment
There was a problem hiding this comment.
Why is the pending delete information being sent back to SCM? If SCM is not acting on this information, then pending delete byte and block count only need to be exposed as metrics and we can use prometheus + grafana to either inspect individual nodes or aggregate across the cluster. Pushing observability metrics like this through SCM and Recon is not sustainable, we already have a metrics system to handle this.
|
@errose28 Our main goal here is to build a dashboard in Recon to show storage usage distribution across the cluster. Recon already uses StorageReport for other usage stats, so we thought it would be a good idea to extend it to include pending deletion info as well, keeping everything in one place. We considered using Prometheus metrics, but based on my understanding. In ozone services these values might reset from beginning and become inaccurate in case of service restarts. This can lead to wrong conclusions in the dashboard. By passing the data through SCM to Recon, we can ensure more reliable and consistent reporting, even after restarts. Also I think to avoid dependency on Prometheus service running in this case (most customers do not have it set up when they hit issues related to deletion or anything related to space reclamation). |
We should leave all dashboarding to Grafana instead of reimplementing the wheel in Recon.
StorageReport is not a catch-all to implement metrics collection in Recon, it is for reporting information about the disks within datanodes. In general Recon displays current categorical information (keys, containers, pipelines, volumes/disks) and metrics + dashboards track numeric information (bytes, durations, or event counts) over time. Deletion progress falls cleanly in the second category.
Metrics collection is decoupled from persistence. Yes we will need to persist the counters in datanodes so they do not need to be recomputed on every restart, but this is independent of how the metrics are published and consumed by other services.
If any Ozone users want dashboards they will need Prometheus and Grafana. We do not have the bandwidth to build and maintain our own dashboarding setup when quality ones already exist. This should be a reasonably small change if it is done the way Ozone is designed to handle it:
|
|
I agree with your points.
Are you telling that we should not rely on Recon for capacity distribution? |
|
@errose28 Followup to my previous comment. To retrieve pending deletion data, one option is to query Prometheus metrics from each DataNode individually. Alternatively, we could introduce a new endpoint at the DataNode level, which Recon would call to gather and aggregate this information. However, since the storage report already includes other capacity-related fields, we considered extending it to include pending deletion information as well — which would simplify the implementation. Given that this data ultimately needs to be available at the Recon level — especially to enhance the current high-level view with more granular insights (e.g., how much space can be reclaimed after deletions or where deletions are stuck) — which approach do you think is better? |
"Capacity Distribution" will need a tighter definition in order to make this call. We should not rely on Recon for time series data, and the information added in this PR is time series data.
You have just described the process of Prometheus's metrics scraping : ) This is why we should not use Recon for this since Prometheus already handles this.
I don't see any evidence supporting this simplification claim. Making Ozone do this is far more complex to implement than leaving it to existing systems designed to handle time series data.
I disagree that this needs to be available in Recon. Recon is not designed to store or display time-series data. The terminology "stuck" used here is an indication that we are dealing with times series data, because it translates to deletion progress over time.
Publish this time series data as metrics from the datanodes and let existing systems handle it much better than we can. |
8f67295 to
5963b36
Compare
313053a to
317ba8c
Compare
317ba8c to
13d7cb9
Compare
|
Based on the analysis, the JMX Metrics approach appears to be more efficient for exposing pending deletion information, primarily due to the reduced overhead and direct access for monitoring. While it introduces a JMX dependency, the benefits in terms of performance and data relevance outweigh the disadvantages of the Storage Report approach. So, it will be handled as part of #8995 |
What changes were proposed in this pull request?
Accurate accounting of pending-delete bytes per container
incrPendingDeletions(long)anddecrPendingDeletions(long)toContainerDatawith a newpendingDeletionBytesfield that is persisted in container metadata.DeleteBlocksCommandHandler:incrPendingDeletionBytes()which sums the sizes of blocks in eachDeletedBlocksTransactionand updates the container’s counter when blocks are logically marked for deletion.deleteViaTransactionStore(), the counter is decremented accordingly.Datanode-level aggregation & reporting
StorageReportnow includes an aggregatedpendingDeletionBytesvalue (sum across all containers).Metrics & API exposure
pendingDeletionBytesas a Datanode metric./datanode) to surface the pending deletion bytes.Upgrade path for existing clusters
pendingDeletionBytesis missing, it is computed usingKeyValueContainerUtil.populateContainerMetadata()by scanning the block table.What is the link to the Apache JIRA
HDDS-13192
How was this patch tested?
Unit Tests
ContainerData.DeleteBlocksCommandHandlerflows that mark and delete blocks.Integration Testing
Upgrade & Backward Compatibility
Performance Testing