Skip to content

LOG-9389: Payload Too Large errors when sending logs to Azure monitor using Log Ingestion API#3277

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
Clee2691:LOG-9389
May 15, 2026
Merged

LOG-9389: Payload Too Large errors when sending logs to Azure monitor using Log Ingestion API#3277
openshift-merge-bot[bot] merged 1 commit into
openshift:masterfrom
Clee2691:LOG-9389

Conversation

@Clee2691
Copy link
Copy Markdown
Contributor

@Clee2691 Clee2691 commented May 13, 2026

Description

This PR sets the default maxWrite value to 1,000,000 bytes for the AzureLogsIngestion output tuning parameter. This aligns with the maximum size of an API call enforced by Azure’s Logs Ingestion API service limits.

/cc @vparfonov
/assign @jcantrill

Links

Summary by CodeRabbit

  • Documentation

    • Deprecation date for the legacy Azure HTTP Data Collector API set to September 14, 2026; Azure Monitor output flagged for retirement with migration guidance to Azure Logs Ingestion.
    • Examples updated and guidance added about Azure 1MB request body limit.
  • Configuration & Behavior

    • Default batching/max request size for Azure Logs Ingestion capped at 1MB.
  • Tests & Validation

    • Added validation and tests to enforce and verify the 1MB maxWrite limit.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 13, 2026

@Clee2691: This pull request references LOG-9389 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target either version "4.8." or "openshift-4.8.", but it targets "Logging 6.6.0" instead.

Details

In response to this:

Description

This PR sets the default maxWrite value to 1,000,000 bytes for the AzureLogsIngestion output tuning parameter. This aligns with the maximum size of an API call enforced by Azure’s Logs Ingestion API service limits.

/cc @vparfonov
/assign @jcantrill

Links

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from vparfonov May 13, 2026 21:45
@Clee2691
Copy link
Copy Markdown
Contributor Author

/hold

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c358ec6c-82ec-43e0-a162-fb2355ee4466

📥 Commits

Reviewing files that changed from the base of the PR and between 3056fec and 4198a5b.

📒 Files selected for processing (14)
  • docs/features/logforwarding/outputs/azure/azure-logs-ingestion-forwarding.adoc
  • docs/features/logforwarding/outputs/azure/azure-monitor-log-forwarding.adoc
  • internal/generator/vector/output/azure/azurelogsingestion/azli_common.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_timestamp_field.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_tls.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_token_scope.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_tuning.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_tuning_under_limit.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_workload_identity.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azurelogsingestion.go
  • internal/generator/vector/output/azure/azurelogsingestion/azurelogsingestion_test.go
  • internal/validations/observability/outputs/validate.go
  • internal/validations/observability/outputs/validate_azurelogsingestion.go
  • internal/validations/observability/outputs/validate_azurelogsingestion_test.go
✅ Files skipped from review due to trivial changes (4)
  • internal/validations/observability/outputs/validate.go
  • internal/generator/vector/output/azure/azurelogsingestion/azli_tuning.toml
  • internal/validations/observability/outputs/validate_azurelogsingestion.go
  • docs/features/logforwarding/outputs/azure/azure-logs-ingestion-forwarding.adoc
🚧 Files skipped from review as they are similar to previous changes (6)
  • internal/generator/vector/output/azure/azurelogsingestion/azli_common.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azurelogsingestion.go
  • internal/generator/vector/output/azure/azurelogsingestion/azli_token_scope.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_workload_identity.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_timestamp_field.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_tls.toml

Walkthrough

Updates docs and Vector generator configs to enforce Azure Logs Ingestion 1MB request body limit, add a September 14, 2026 deprecation notice for the legacy Azure Monitor output, clamp/default batch sizes to 1_000_000 bytes in generator code, and add validation and tests for the maxWrite tuning value.

Changes

Azure Logs Ingestion limits, docs, generator, and validation

Layer / File(s) Summary
Docs: deprecation, examples, and caution
docs/features/logforwarding/outputs/azure/azure-monitor-log-forwarding.adoc, docs/features/logforwarding/outputs/azure/azure-logs-ingestion-forwarding.adoc
Add deprecation line for AzureMonitor retiring on Sept 14, 2026; rename example ClusterLogForwarder metadata names to azure-logs-ingestion; change example tuning.maxWrite from 10M1M and add CAUTION about Azure 1MB request body limit.
Vector sink TOML templates: encoding and batch sections
internal/generator/vector/output/azure/azurelogsingestion/azli_*.toml (azli_common.toml, azli_timestamp_field.toml, azli_tls.toml, azli_token_scope.toml, azli_workload_identity.toml, azli_tuning.toml, azli_tuning_under_limit.toml)
Move except_fields = ["_internal"] into [sinks.output_azure_log_ingestion.encoding] and add [sinks.output_azure_log_ingestion.batch] with max_bytes = 1000000 in templates; reduce example batch.max_bytes from 10000000 → 1000000 and add a new under-limit tuning example file.
Generator: default and clamping of batch MaxBytes
internal/generator/vector/output/azure/azurelogsingestion/azurelogsingestion.go
Add AzureDefaultMaxBytes = 1_000_000; when constructing s.Batch, clamp batch.MaxBytes to AzureDefaultMaxBytes if provided or use a fallback batch with MaxBytes: AzureDefaultMaxBytes when none returned by common.NewApiBatch(o).
Generator tests: tuning scenarios
internal/generator/vector/output/azure/azurelogsingestion/azurelogsingestion_test.go
Split prior single tuning test into two entries: one using the existing tuning file that exceeds the 1MB cap and one using an inline tuning spec under the limit, referencing the new azli_tuning_under_limit.toml.
Validation: enforce maxWrite ≤ 1MB and tests
internal/validations/observability/outputs/validate.go, internal/validations/observability/outputs/validate_azurelogsingestion.go, internal/validations/observability/outputs/validate_azurelogsingestion_test.go
Add validateAzureLogsIngestionMaxWrite and call it for obs.OutputTypeAzureLogsIngestion; validator returns an error message when Tuning.MaxWrite exceeds azurelogsingestion.AzureDefaultMaxBytes. Include Ginkgo tests for unset, nil, under/at 1MB, exceeding 1MB, and non-Azure output cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels: lgtm

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test assertions lack meaningful failure messages. All Expect() calls missing context. When tests fail, errors won't clearly explain what went wrong. Add failure messages to assertions. For failure case, verify message content with ContainSubstring() like other validation tests do. Follow patterns in validate_elasticsearch_headers_test.go.
✅ Passed checks (10 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly identifies the main change: addressing 'Payload Too Large errors' with Azure Monitor's Log Ingestion API, which aligns with the changeset's focus on enforcing a 1MB maxWrite limit.
Description check ✅ Passed The PR description includes the mandatory Description section explaining the intent and rationale, mandatory reviewer/assignee mentions (/cc and /assign), and a JIRA link. All required template sections are present and adequately filled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All Ginkgo test names are static and deterministic. No dynamic content found - no fmt.Sprintf, variable concatenation, timestamps, UUIDs, or other runtime-specific values in test titles.
Microshift Test Compatibility ✅ Passed The PR adds/modifies Ginkgo test cases, but these are unit tests in internal/ directories, not e2e tests. They don't use cluster APIs or MicroShift-unavailable resources.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No e2e tests with multi-node assumptions added. Only unit tests in internal/ packages for validation and code generation logic.
Topology-Aware Scheduling Compatibility ✅ Passed This PR makes configuration and validation changes for Azure log forwarding only. No deployment manifests, controllers, or scheduling constraints are introduced. Check not applicable.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations found. Code is limited to config builders, validation, and tests with no process-level stdout writes or suite-level hooks.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added. Tests added are unit tests in internal/ that test config generation and validation with mock objects, not e2e tests that require cluster connectivity.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2026
@Clee2691
Copy link
Copy Markdown
Contributor Author

Clee2691 commented May 13, 2026

We should consider enforcing a hard limit of 1,000,000 bytes for maxWrite. Since Azure's API rejects anything larger, allowing a higher configuration value offers no functional benefit.

Thoughts? @jcantrill @vparfonov

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@docs/features/logforwarding/outputs/azure/azure-logs-ingestion-forwarding.adoc`:
- Line 7: Update the deprecation sentence that currently reads "Since the HTTP
Data Collector API is deprecated, and will be removed in **September 14th
2026**..." to use a clearer absolute date and preposition: change it to "Since
the HTTP Data Collector API is deprecated, and will be removed on **September
14, 2026**..." — update the text containing "AzureMonitor" and the original
sentence to reflect this exact wording.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 23e05b4b-fd1d-4e40-9221-6b3fa379628b

📥 Commits

Reviewing files that changed from the base of the PR and between 30295d6 and 3056fec.

📒 Files selected for processing (8)
  • docs/features/logforwarding/outputs/azure/azure-logs-ingestion-forwarding.adoc
  • docs/features/logforwarding/outputs/azure/azure-monitor-log-forwarding.adoc
  • internal/generator/vector/output/azure/azurelogsingestion/azli_common.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_timestamp_field.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_tls.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_token_scope.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azli_workload_identity.toml
  • internal/generator/vector/output/azure/azurelogsingestion/azurelogsingestion.go

The `azureLogsIngestion` output type sends log events to Azure Monitor using the Logs Ingestion API and a Data Collection Rule (DCR). This is the recommended replacement for the legacy HTTP Data Collector API used by `azureMonitor`.

Since the HTTP Data Collector API is deprecated, and will be removed in **September** 2026, it is recommended to use the Logs Ingestion API instead. The `AzureMonitor` output type will be removed in the future.
Since the HTTP Data Collector API is deprecated, and will be removed in **September 14th 2026**, it is recommended to use the Logs Ingestion API instead. The `AzureMonitor` output type will be removed in the future.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a clearer absolute date format in the deprecation sentence.

Line 7 says “removed in September 14th 2026”, which reads awkwardly. Prefer “removed on September 14, 2026” for clarity and consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@docs/features/logforwarding/outputs/azure/azure-logs-ingestion-forwarding.adoc`
at line 7, Update the deprecation sentence that currently reads "Since the HTTP
Data Collector API is deprecated, and will be removed in **September 14th
2026**..." to use a clearer absolute date and preposition: change it to "Since
the HTTP Data Collector API is deprecated, and will be removed on **September
14, 2026**..." — update the text containing "AzureMonitor" and the original
sentence to reflect this exact wording.

@Clee2691
Copy link
Copy Markdown
Contributor Author

/retest

@jcantrill
Copy link
Copy Markdown
Contributor

We should consider enforcing a hard limit of 1,000,000 bytes for maxWrite. Since Azure's API rejects anything larger, allowing a higher configuration value offers no functional benefit.

Agree. It should be possible to restrict in the API declaratively so any changes are rejected immediately. The concerning behavior, especially for audit logs, is I believe larger messages will get dropped entirely; we should confirm this. Additionally we should understand if we at a minimum get 'discarded' metrics. This would make it consistent with the other outputs and hopefully we can resolve with @vparfonov truncation work.

@Clee2691
Copy link
Copy Markdown
Contributor Author

We should consider enforcing a hard limit of 1,000,000 bytes for maxWrite. Since Azure's API rejects anything larger, allowing a higher configuration value offers no functional benefit.

Agree. It should be possible to restrict in the API declaratively so any changes are rejected immediately. The concerning behavior, especially for audit logs, is I believe larger messages will get dropped entirely; we should confirm this. Additionally we should understand if we at a minimum get 'discarded' metrics. This would make it consistent with the other outputs and hopefully we can resolve with @vparfonov truncation work.

We do get vector_component_discarded_events_total metrics for these discarded events. Large singe log messages ( >1MB) will be dropped. Confirmation is the hard limit that is stated in Azure. We already see 413s when the collector starts.

I've tested setting the limit to 1MB and I do not see any 413s with the current logs produced.

This is something the customer has to be aware of when forwarding to Azure using the Logs Ingestion API

@Clee2691
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@Clee2691
Copy link
Copy Markdown
Contributor Author

/retest

@jcantrill
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Clee2691, jcantrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2026
@Clee2691
Copy link
Copy Markdown
Contributor Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 14, 2026
@jcantrill
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

@Clee2691: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit e6cb82d into openshift:master May 15, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. release/6.6

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants