NIFI-15681 - Enhance PutElasticsearchJson to support NDJSON, JSON Arr… by agturley · Pull Request #10981 · apache/nifi

agturley · 2026-03-08T05:17:27Z

…ay, and Single JSON input formats with size-based batching

Summary

NIFI-15681

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Apache NiFi Jira issue created

Pull Request Tracking

Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000
Pull request contains commits signed with a registered key indicating Verified status

Pull Request Formatting

Pull Request based on current revision of the main branch
Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

Build completed using ./mvnw clean install -P contrib-check
- JDK 21
- JDK 25

Licensing

New dependencies are compatible with the Apache License 2.0 according to the License Policy
New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

Documentation formatting appears as expected in rendered files

…ay, and Single JSON input formats with size-based batching

pvillard31

Few comments after having a quick look through the changes.

...-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJson.java

pvillard31 · 2026-03-09T18:53:30Z

...-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJson.java

+                            chunkBytes += docBytes;
+                            totalBytesAccumulated += docBytes;
+                            if (chunkBytes >= maxBatchBytes) {
+                                flushChunk(operations, operationFlowFiles, errorFlowFiles, flowFile, pendingBulkErrors, context, session);


When flushChunk throws an ElasticsearchException, the handler combines operationFlowFiles and allProcessedFlowFiles and sends all of them to retry/failure. The FlowFiles in allProcessedFlowFiles have already been successfully indexed by prior flushChunk calls. Routing them to retry could cause duplicate indexing when re-processed. Not saying this should be done differently but just mentioning that it could cause duplicates.

If such behaviour is retained, I'd suggest it needs to be clearly documented as I think it would be a change to how things were handled previously, and the duplication could cause problems for some systems

I'm not a huge fan of the potential to have duplicates, this seems to work.... When a bulk request comes back from Elasticsearch, we look at which individual documents failed. Instead of buffering the raw bytes of every document as we go (which would hold the entire file in memory twice), we just record the index of each failed document — a small integer per error, regardless of document size.

Once all chunks for a FlowFile are processed, we route it:

No errors: clone the original FlowFile straight to REL_SUCCESSFUL. Zero extra I/O.

Single JSON input (one doc per FlowFile): if it errored, clone it to REL_ERRORS. If not, clone to REL_SUCCESSFUL. Again, no re-read needed.

NDJSON or JSON Array input with at least one error: we do a single re-read of the original FlowFile and split it in one pass into two streams: one for the failed records (REL_ERRORS) and one for the successful records (REL_SUCCESSFUL). Both outputs are written as clean NDJSON with no trailing newline.
The re-read only happens for FlowFiles that actually had partial failures in NDJSON/JSON Array format, so the common happy path (all docs succeed) is just a cheap clone with no extra I/O.

...-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJson.java

...cessors/src/test/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJsonTest.java

...ch-client-service-api/src/main/java/org/apache/nifi/elasticsearch/IndexOperationRequest.java

...-processors/src/main/java/org/apache/nifi/processors/elasticsearch/PutElasticsearchJson.java

…ay, and Single JSON input formats with size-based batching

agturley · 2026-03-10T01:48:04Z

Finished round1 of your suggestions, please let me know your thoughts on the error handling shenanigans I'm trying. I'll be doing high volumes testing tomorrow and report back.

…ay, and Single JSON input formats with size-based batching

NIFI-15681 - Enhance PutElasticsearchJson to support NDJSON, JSON Arr…

6c8505e

…ay, and Single JSON input formats with size-based batching

agturley force-pushed the NIFI-15681 branch from c00847c to 6c8505e Compare March 9, 2026 13:58

pvillard31 requested changes Mar 9, 2026

View reviewed changes

NIFI-15681 - Enhance PutElasticsearchJson to support NDJSON, JSON Arr…

a596a6c

…ay, and Single JSON input formats with size-based batching

agturley requested a review from pvillard31 March 10, 2026 01:48

NIFI-15681 - Enhance PutElasticsearchJson to support NDJSON, JSON Arr…

e29f75a

…ay, and Single JSON input formats with size-based batching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIFI-15681 - Enhance PutElasticsearchJson to support NDJSON, JSON Arr…#10981

NIFI-15681 - Enhance PutElasticsearchJson to support NDJSON, JSON Arr…#10981
agturley wants to merge 3 commits intoapache:mainfrom
agturley:NIFI-15681

agturley commented Mar 8, 2026

Uh oh!

pvillard31 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pvillard31 Mar 9, 2026

Uh oh!

ChrisSamo632 Mar 9, 2026

Uh oh!

agturley Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agturley commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

agturley commented Mar 8, 2026

Summary

Tracking

Issue Tracking

Pull Request Tracking

Pull Request Formatting

Verification

Build

Licensing

Documentation

Uh oh!

pvillard31 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pvillard31 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

ChrisSamo632 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

agturley Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agturley commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agturley Mar 10, 2026 •

edited

Loading

agturley commented Mar 10, 2026 •

edited

Loading