HDDS-13192. Adding pending delete bytes info in storage report of DN by priyeshkaratha · Pull Request #8755 · apache/ozone

priyeshkaratha · 2025-07-07T07:42:20Z

What changes were proposed in this pull request?

Accurate accounting of pending-delete bytes per container

Added incrPendingDeletions(long) and decrPendingDeletions(long) to ContainerData with a new pendingDeletionBytes field that is persisted in container metadata.
Integrated the counter with DeleteBlocksCommandHandler:
- Introduced helper method incrPendingDeletionBytes() which sums the sizes of blocks in each DeletedBlocksTransaction and updates the container’s counter when blocks are logically marked for deletion.
- When blocks are physically removed in deleteViaTransactionStore(), the counter is decremented accordingly.

Datanode-level aggregation & reporting

StorageReport now includes an aggregated pendingDeletionBytes value (sum across all containers).
Introduced a short-lived in-memory cache (default TTL: 10 minutes) for the aggregated value to avoid recomputing on every heartbeat, reducing CPU overhead.

Metrics & API exposure

Exposed pendingDeletionBytes as a Datanode metric.
Extended:
- Datanode-to-SCM heartbeat protobuf to include the new metric.
- Recon REST API (/datanode) to surface the pending deletion bytes.

Upgrade path for existing clusters

For containers created before the upgrade:
- During open, if pendingDeletionBytes is missing, it is computed using KeyValueContainerUtil.populateContainerMetadata() by scanning the block table.
- A background task reprocesses existing delete transactions and block metadata to compute accurate pending delete sizes.
Mixed-version compatibility:
- Older DNs/SCMs gracefully ignore unknown fields.
- Recon detects support and falls back to legacy logic if needed.

What is the link to the Apache JIRA

HDDS-13192

How was this patch tested?

Unit Tests

Verified counter updates via unit tests in ContainerData.
Added unit tests for DeleteBlocksCommandHandler flows that mark and delete blocks.

Integration Testing

Pending

Upgrade & Backward Compatibility

No impacts

Performance Testing

Pending

Copilot

Pull Request Overview

This PR introduces tracking and reporting of bytes pending deletion at the container and datanode levels. It updates container metadata handling, deletion command logic, storage reports, metrics, APIs, and relevant tests across Recon, SCM, and container-service modules.

Adds pendingDeletionBytes counters in container data and integrates them into delete command handlers.
Extends Datanode and storage‐location reports, SCM heartbeats, Recon endpoints, and metrics to include pending‐deletion bytes.
Implements a short-lived cache in HddsVolume for aggregated pending‐deletion bytes and updates unit tests accordingly.

Reviewed Changes

Copilot reviewed 39 out of 39 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
recon/src/main/java/.../DatanodeStorageReport.java	Added `pendingDeletions` field, constructor arg, and getter.
recon/src/main/java/.../NodeEndpoint.java & ClusterStateEndpoint.java	Propagate new field into storage‐report construction.
recon/src/test/java/.../TestNSSummaryEndpoint*.java	Updated mocks to pass extra constructor parameter.
server-scm/.../SCMNodeStat.java & SCMNodeMetric.java & NodeStat.java	Added `pendingDeletions` metric in stat/metric APIs.
server-scm/.../Test*.java	Updated tests to account for new stat constructor args.
container-service/src/.../KeyValueContainerData.java	Added methods for pending deletion bytes.
container-service/src/.../KeyValueContainerUtil.java	Compute `blockPendingDeletionBytes` on metadata population.
container-service/src/.../HddsVolume.java	Cached aggregation of pending‐deletion bytes with TTL.
container-service/src/.../DeleteBlocksCommandHandler.java	Increment pending‐deletion bytes when marking blocks.
container-service/src/.../BlockDeletingService.java & Metrics	Track and expose total pending‐deletion bytes in metrics.
common/src/main/java/.../OzoneConsts.java	Added constant for metadata key.

Comments suppressed due to low confidence (3)

hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/helpers/ContainerUtils.java:343

The exception message refers to getPendingDeletionBlocks but the method is getPendingDeletionBytes. Update the message for consistency.

              " not support.");

hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java:324

It would be beneficial to add a unit test for getPendingDeletionBytes(), verifying both correct aggregation and cache-expiration behavior.

  public long getPendingDeletionBytes() {

hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainer.java:1040

[nitpick] The inline comment was accidentally concatenated (HddsVolumetestImportV2ReplicaToV3HddsVolume). Please revert it to // create HddsVolume or clarify.

    // create HddsVolumetestImportV2ReplicaToV3HddsVolume

errose28

Why is the pending delete information being sent back to SCM? If SCM is not acting on this information, then pending delete byte and block count only need to be exposed as metrics and we can use prometheus + grafana to either inspect individual nodes or aggregate across the cluster. Pushing observability metrics like this through SCM and Recon is not sustainable, we already have a metrics system to handle this.

priyeshkaratha · 2025-07-18T07:08:29Z

@errose28
Thank you for raising this valid point regarding the use of SCM for exposing pending delete information.

Our main goal here is to build a dashboard in Recon to show storage usage distribution across the cluster. Recon already uses StorageReport for other usage stats, so we thought it would be a good idea to extend it to include pending deletion info as well, keeping everything in one place.

We considered using Prometheus metrics, but based on my understanding. In ozone services these values might reset from beginning and become inaccurate in case of service restarts. This can lead to wrong conclusions in the dashboard. By passing the data through SCM to Recon, we can ensure more reliable and consistent reporting, even after restarts.

Also I think to avoid dependency on Prometheus service running in this case (most customers do not have it set up when they hit issues related to deletion or anything related to space reclamation).

errose28 · 2025-07-18T22:13:03Z

Our main goal here is to build a dashboard in Recon to show storage usage distribution across the cluster.

We should leave all dashboarding to Grafana instead of reimplementing the wheel in Recon.

Recon already uses StorageReport for other usage stats, so we thought it would be a good idea to extend it to include pending deletion info as well, keeping everything in one place

StorageReport is not a catch-all to implement metrics collection in Recon, it is for reporting information about the disks within datanodes. In general Recon displays current categorical information (keys, containers, pipelines, volumes/disks) and metrics + dashboards track numeric information (bytes, durations, or event counts) over time. Deletion progress falls cleanly in the second category.

We considered using Prometheus metrics, but based on my understanding. In ozone services these values might reset from beginning and become inaccurate in case of service restarts. This can lead to wrong conclusions in the dashboard.

Metrics collection is decoupled from persistence. Yes we will need to persist the counters in datanodes so they do not need to be recomputed on every restart, but this is independent of how the metrics are published and consumed by other services.

Also I think to avoid dependency on Prometheus service running in this case (most customers do not have it set up when they hit issues related to deletion or anything related to space reclamation).

If any Ozone users want dashboards they will need Prometheus and Grafana. We do not have the bandwidth to build and maintain our own dashboarding setup when quality ones already exist.

This should be a reasonably small change if it is done the way Ozone is designed to handle it:

Add metrics, with persistence underneath if necessary (this PR)
Add the metrics to a dashboard, like this (follow-up PR)

priyeshkaratha · 2025-07-21T04:44:55Z

I agree with your points.
But here our goal is to enhance Recon's storage observability by providing a consolidated view of used capacity, including pending deletions. Recon already consumes the StorageReport for capacity metrics—extending this structure allows us to enrich the existing model without introducing a parallel system. I have already added metrics that can be published to prometheus too.

If any Ozone users want dashboards they will need Prometheus and Grafana. We do not have the bandwidth to build and maintain our own dashboarding setup when quality ones already exist.

Are you telling that we should not rely on Recon for capacity distribution?

priyeshkaratha · 2025-07-21T14:20:26Z

@errose28 Followup to my previous comment.

To retrieve pending deletion data, one option is to query Prometheus metrics from each DataNode individually. Alternatively, we could introduce a new endpoint at the DataNode level, which Recon would call to gather and aggregate this information. However, since the storage report already includes other capacity-related fields, we considered extending it to include pending deletion information as well — which would simplify the implementation.

Given that this data ultimately needs to be available at the Recon level — especially to enhance the current high-level view with more granular insights (e.g., how much space can be reclaimed after deletions or where deletions are stuck) — which approach do you think is better?

errose28 · 2025-07-21T21:22:15Z

Are you telling that we should not rely on Recon for capacity distribution?

"Capacity Distribution" will need a tighter definition in order to make this call. We should not rely on Recon for time series data, and the information added in this PR is time series data.

Alternatively, we could introduce a new endpoint at the DataNode level, which Recon would call to gather and aggregate this information.

You have just described the process of Prometheus's metrics scraping : ) This is why we should not use Recon for this since Prometheus already handles this.

since the storage report already includes other capacity-related fields, we considered extending it to include pending deletion information as well — which would simplify the implementation.

I don't see any evidence supporting this simplification claim. Making Ozone do this is far more complex to implement than leaving it to existing systems designed to handle time series data.

Given that this data ultimately needs to be available at the Recon level — especially to enhance the current high-level view with more granular insights (e.g., how much space can be reclaimed after deletions or where deletions are stuck)

I disagree that this needs to be available in Recon. Recon is not designed to store or display time-series data. The terminology "stuck" used here is an indication that we are dealing with times series data, because it translates to deletion progress over time.

which approach do you think is better?

Publish this time series data as metrics from the datanodes and let existing systems handle it much better than we can.

priyeshkaratha · 2025-09-02T08:58:01Z

Based on the analysis, the JMX Metrics approach appears to be more efficient for exposing pending deletion information, primarily due to the reduced overhead and direct access for monitoring. While it introduces a JMX dependency, the benefits in terms of performance and data relevance outweigh the disadvantages of the Storage Report approach.

So, it will be handled as part of #8995

priyeshkaratha marked this pull request as draft July 7, 2025 07:42

kerneltime requested a review from Copilot July 7, 2025 16:15

Copilot AI reviewed Jul 7, 2025

View reviewed changes

Comment thread ...rvice/src/main/java/org/apache/hadoop/ozone/container/common/impl/StorageLocationReport.java Outdated

Comment thread ...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/volume/HddsVolume.java Outdated

priyeshkaratha force-pushed the HDDS-13192 branch from 483aa16 to fbb66ee Compare July 8, 2025 03:02

errose28 reviewed Jul 17, 2025

View reviewed changes

priyeshkaratha force-pushed the HDDS-13192 branch from 8f67295 to 5963b36 Compare July 29, 2025 05:00

priyeshkaratha force-pushed the HDDS-13192 branch 2 times, most recently from 313053a to 317ba8c Compare August 25, 2025 16:35

adding pending bytes to the storage report

13d7cb9

priyeshkaratha force-pushed the HDDS-13192 branch from 317ba8c to 13d7cb9 Compare August 26, 2025 06:26

priyeshkaratha closed this Sep 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-13192. Adding pending delete bytes info in storage report of DN#8755

HDDS-13192. Adding pending delete bytes info in storage report of DN#8755
priyeshkaratha wants to merge 1 commit intoapache:HDDS-13177from
priyeshkaratha:HDDS-13192

priyeshkaratha commented Jul 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

errose28 left a comment

Uh oh!

priyeshkaratha commented Jul 18, 2025

Uh oh!

errose28 commented Jul 18, 2025

Uh oh!

priyeshkaratha commented Jul 21, 2025 •

edited

Loading

Uh oh!

priyeshkaratha commented Jul 21, 2025

Uh oh!

errose28 commented Jul 21, 2025

Uh oh!

priyeshkaratha commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

priyeshkaratha commented Jul 7, 2025

What changes were proposed in this pull request?

Accurate accounting of pending-delete bytes per container

Datanode-level aggregation & reporting

Metrics & API exposure

Upgrade path for existing clusters

What is the link to the Apache JIRA

How was this patch tested?

Unit Tests

Integration Testing

Upgrade & Backward Compatibility

Performance Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

errose28 left a comment

Choose a reason for hiding this comment

Uh oh!

priyeshkaratha commented Jul 18, 2025

Uh oh!

errose28 commented Jul 18, 2025

Uh oh!

priyeshkaratha commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

priyeshkaratha commented Jul 21, 2025

Uh oh!

errose28 commented Jul 21, 2025

Uh oh!

priyeshkaratha commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

priyeshkaratha commented Jul 21, 2025 •

edited

Loading