NIFI-15570: Keep track of Content Claims where the last Claim in a Re… by markap14 · Pull Request #10874 · apache/nifi

markap14 · 2026-02-09T01:31:03Z

…source Claim can be truncated if it is large. Whenever FlowFile Repository is checkpointed, truncate any large Resource Claims when possible and necessary to avoid having a situtation where a small FlowFile in a given Resource Claim prevents a large Content Claim from being cleaned up.

Summary

NIFI-00000

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using ./mvnw clean install -P contrib-check
- JDK 21
- JDK 25

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

pvillard31 · 2026-02-09T09:31:22Z

...framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java

+            return false;
+        }
+
+        private void truncate(final ContentClaim claim) {


The truncate method doesn't verify that the claimant count is still 0 before truncating. If a clone operation increments the claimant count while the truncation task is mid-flight, we could truncate content that is still referenced. Isn't it a concern?

Wondering if we could have a race condition:

TruncateClaims.truncateClaims() checks claim.isTruncationCandidate() and sees true

A clone operation calls incrementClaimaintCount(), which sets truncationCandidate = false and increments the claimant count

TruncateClaims.truncate() proceeds to truncate the file anyway, corrupting the data for the newly cloned FlowFile

Or maybe this scenario is not an option for some reasons that I missed?

Thanks for reviewing @pvillard31!
In short, no, that should not be possible. The only way we will ever queue up the ContentClaim for truncation is if the FlowFile Repository is synched to disk (typically on checkpoint but also possible on every commit if fsync property in nifi.properties is set to true) and the Content Claim has truncationCandidate = true. So at this point, the FlowFile Repository is the owner of the Content Claim and no Processor has access to it, and the Repository determines that there are no longer any references to it. As a result, we'll only queue up the Content Claim for truncation if there's only 1 referencing FlowFile and that one referencing FlowFile is now being removed. So no concerns about the claimant count going back up.

pvillard31 · 2026-02-09T09:32:01Z

As a side note, the integration test failure was caused by another commit and is now fixed if you rebase on main.

…source Claim can be truncated if it is large. Whenever FlowFile Repository is checkpointed, truncate any large Resource Claims when possible and necessary to avoid having a situtation where a small FlowFile in a given Resource Claim prevents a large Content Claim from being cleaned up.

exceptionfactory

Thanks for the detailed work on this improvement @markap14. The basic concept makes sense, and the overall implementation looks straightforward, with very helpful tests at multiple levels.

I noted a handful of most minor suggestions and questions, but this looks close the completion.

exceptionfactory · 2026-03-13T15:21:07Z

.../src/main/java/org/apache/nifi/controller/repository/claim/StandardResourceClaimManager.java

    private static final Logger logger = LoggerFactory.getLogger(StandardResourceClaimManager.class);
    private final ConcurrentMap<ResourceClaim, ClaimCount> claimantCounts = new ConcurrentHashMap<>();
    private final BlockingQueue<ResourceClaim> destructableClaims = new LinkedBlockingQueue<>(50000);
+    private final BlockingQueue<ContentClaim> truncatableClaims = new LinkedBlockingQueue<>(100000);


Is there any particular reason for selecting 100,000 as the queue size? Is it related to the destructableClaims size, or just a reasonably high limit? If there is any particular reason, it would be helpful to add a comment for future reference.

No, no particular reason. Just wanted a big value that's small enough to not cause heap exhaustion.

exceptionfactory · 2026-03-13T15:22:35Z

.../src/main/java/org/apache/nifi/controller/repository/claim/StandardResourceClaimManager.java

+            logger.debug("Marking {} as truncatable", contentClaim);
+            try {
+                if (!truncatableClaims.offer(contentClaim, 1, TimeUnit.MINUTES)) {
+                    logger.debug("Unable to mark {} as truncatable because the queue is full.", contentClaim);


It seems like this would be better as an INFO level, and it would be useful to include the queue size.

Suggested change

logger.debug("Unable to mark {} as truncatable because the queue is full.", contentClaim);

logger.info("Unable to mark {} as truncatable because maximum queue size [{}] reached", truncatableClaims.size(), contentClaim);

exceptionfactory · 2026-03-13T15:23:46Z

...ions/src/main/java/org/apache/nifi/processors/tests/system/GenerateTruncatableFlowFiles.java

+        .description("The maximum number of batches to generate. Each batch produces 10 FlowFiles (9 small + 1 large). "
+                     + "Once this many batches have been generated, no more FlowFiles will be produced until the processor is stopped and restarted.")


Recommend using a multi-line string

exceptionfactory · 2026-03-13T15:27:07Z

...re/src/test/java/org/apache/nifi/controller/repository/TestWriteAheadFlowFileRepository.java

+        for (final FlowFileRecord ff : recovered) {
+            if (ff.getContentClaim() != null) {


Is it possible for all of the recovered records to have null content Claims? It looks like it would be better to get a filtered list of non-null recovered records, assert that the list is not empty, and then check the truncation candidate status.

...re/src/test/java/org/apache/nifi/controller/repository/TestWriteAheadFlowFileRepository.java

...framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java

exceptionfactory · 2026-03-13T15:53:53Z

...framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java

+            // If able, truncate those claims. Otherwise, save those claims in the Truncation Claim Manager to be truncated on the next run.
+            // This prevents us from having a case where we could truncate a big claim but we don't because we're not yet running out of disk space,
+            // but then we later start to run out of disk space and lost the opportunity to truncate that big claim.
+            while (true) {


The while (true) { construction always gives me pause, although the return clarifies expected behavior. Since this Runnable is executed on a schedule, is it necessary to have this loop, as opposed to just waiting for the next invocation from the scheduler?

exceptionfactory · 2026-03-13T15:55:35Z

...framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java

+            }
+
+            if (!isArchiveClearedOnLastRun(container)) {
+                LOG.debug("Truncation is not active for container {} because the archive was not cleared on the last run.", container);


I generally avoid . at the end of log messages, but it seems to be used on many of these logs, recommend removing.

exceptionfactory · 2026-03-13T15:56:37Z

...framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java

+                // This is unlikely but can occur if the claim was truncatable and the underlying Resource Claim becomes
+                // destructable. In this case, we may archive or delete the entire ResourceClaim. This is safe to ignore,
+                // since it means the data is cleaned up anyway.
+                LOG.debug("Failed to truncate {} because file does not exist.", claim, nsfe);


Suggested change

LOG.debug("Failed to truncate {} because file does not exist.", claim, nsfe);

LOG.debug("Failed to truncate {} because file [{}] does not exist", claim, path, nsfe);

exceptionfactory · 2026-03-13T15:58:12Z

...framework-core/src/main/java/org/apache/nifi/controller/repository/FileSystemRepository.java

+        private static final int MAX_THRESHOLD = 100_000;
+        private final Map<String, List<ContentClaim>> truncationClaims = new HashMap<>();
+
+        public synchronized void addTruncationClaims(final String container, final List<ContentClaim> claim) {


Do these class methods need to be public?

markap14 · 2026-03-13T20:57:30Z

Thanks for the feedback @exceptionfactory . Updated.

exceptionfactory

Thanks for addressing the feedback @markap14! The changes look good, I plan to merge soon pending successful automated builds.

markap14 force-pushed the NIFI-15570 branch from 7532de6 to b409c81 Compare February 9, 2026 01:35

pvillard31 reviewed Feb 9, 2026

View reviewed changes

markap14 force-pushed the NIFI-15570 branch 2 times, most recently from 46becb2 to a62dfca Compare March 11, 2026 13:44

markap14 force-pushed the NIFI-15570 branch from a62dfca to e85851c Compare March 11, 2026 13:58

exceptionfactory reviewed Mar 13, 2026

View reviewed changes

NIFI-15570: Addressed review feedback; some test cleanup

24a3e0b

exceptionfactory reviewed Mar 13, 2026

View reviewed changes

	logger.debug("Unable to mark {} as truncatable because the queue is full.", contentClaim);
	logger.info("Unable to mark {} as truncatable because maximum queue size [{}] reached", truncatableClaims.size(), contentClaim);

		.description("The maximum number of batches to generate. Each batch produces 10 FlowFiles (9 small + 1 large). "
		+ "Once this many batches have been generated, no more FlowFiles will be produced until the processor is stopped and restarted.")

		for (final FlowFileRecord ff : recovered) {
		if (ff.getContentClaim() != null) {

	LOG.debug("Failed to truncate {} because file does not exist.", claim, nsfe);
	LOG.debug("Failed to truncate {} because file [{}] does not exist", claim, path, nsfe);

Conversation

markap14 commented Feb 9, 2026

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pvillard31 commented Feb 9, 2026

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markap14 commented Mar 13, 2026

Uh oh!

exceptionfactory left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants