fix(ibmcloud): preserve path component in COS backend URL#830
fix(ibmcloud): preserve path component in COS backend URL#830adrianriobo wants to merge 3 commits into
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 35-36: The /etc/resolv.conf fallback currently excludes only IPv4
loopback (127.*) so IPv6 loopback (::1) can still be captured into _dns_servers;
update the awk filter in the _dns_servers assignment (the line using awk
'/^nameserver/ && $2 !~ /^127\./ {print $2}') to also exclude the IPv6 loopback
(e.g. add a condition like $2 != "::1" or $2 !~ /^::1/), ensuring nameserver
entries equal to ::1 are not included.
- Around line 48-54: The script currently checks for "dns_servers" globally
which can match other sections; change the detection/update logic so it only
looks inside the [containers] section: when checking for existence use a
section-scoped search (e.g., search between the line matching "^\[containers\]"
and the next section header "^\[.*\]") to decide whether to replace dns_servers
or append it after the [containers] header; update the two sed/grep branches
that reference dns_servers and /etc/containers/containers.conf and ensure the
inserted value uses the same ${_toml_list} variable and formatting as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 466885cf-1897-4a3c-8cf6-9db3d3fcc8aa
📒 Files selected for processing (1)
pkg/integrations/gitlab/snippet-linux.sh
deekay2310
left a comment
There was a problem hiding this comment.
Apart from my one comment regarding context.go, we should also address CodeRabbit's comments and I think this PR will be ready to merge.
|
|
||
| func manageIntegration(c *Context, ca *ContextArgs) error { | ||
| if ca.GHRunnerArgs != nil { | ||
| ca.GHRunnerArgs.Name = c.RunID() |
There was a problem hiding this comment.
Should this also be changed to c.ProjectName() just like GLRunnerArgs.name?
There was a problem hiding this comment.
yeah I just found out using the RunID as name is not really helpful so make it with the project name clarifies it, but as this change is GH related can you change it on your PR?
There was a problem hiding this comment.
Sounds good! I'll look into it
16330a5 to
08f1b26
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/provider/ibmcloud/ibmcloud.go (1)
227-231: ⚡ Quick winStabilize tag serialization order to avoid nondeterministic resource diffs.
At Line 229, iterating a Go map directly makes
pulumi.StringArrayorder nondeterministic across runs. Since this helper feeds many IBM Cloud resources, this can cause noisy previews/updates if ordering is treated as significant. Sort keys before append.Proposed fix
+import "sort" ... func TagsAsStringArray(tags map[string]string) pulumi.StringArray { result := make(pulumi.StringArray, 0, len(tags)) - for k, v := range tags { + keys := make([]string, 0, len(tags)) + for k := range tags { + keys = append(keys, k) + } + sort.Strings(keys) + for _, k := range keys { + v := tags[k] result = append(result, pulumi.String(fmt.Sprintf("%s:%s", k, v))) } return result }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/provider/ibmcloud/ibmcloud.go` around lines 227 - 231, The TagsAsStringArray function iterates directly over the tags map, which produces nondeterministic ordering in Go since map iteration order is randomized. To fix this, extract all the keys from the tags map into a slice, sort that slice of keys, and then iterate over the sorted keys in the for loop instead of directly iterating over the map. This ensures the resulting pulumi.StringArray has consistent ordering across multiple runs, eliminating noisy diffs in resource previews and updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 84-89: The grep check on line 84 searches for dns_options globally
in the file without scoping it to the [containers] section, and the subsequent
sed replacement on line 85 also operates globally. This can cause unintended
modifications to dns_options in other sections or leave [containers].dns_options
unset. Modify the grep pattern to specifically check for dns_options within the
[containers] section (for example, by using a pattern that matches dns_options
after the [containers] header), and update the sed command on line 85 to scope
the replacement to only the [containers] section to ensure consistent Podman DNS
configuration.
In `@pkg/provider/ibmcloud/services/power/power.go`:
- Around line 84-96: The loop that calls vc.Get(volumeId) currently retries on
all errors for the full duration, but should fail fast on permanent failures.
Modify the error handling block (the else clause where err is not nil) to
inspect the underlying power-go-client error for its HTTP status code. If the
error is a non-retriable 4xx response, return the error immediately to fail
fast. If the error is transient (5xx or 429), continue with the existing retry
logic that sleeps and retries. This same fix should be applied to the similar
error handling pattern in the section that also handles vc.Get errors (lines
97-104 and beyond).
---
Nitpick comments:
In `@pkg/provider/ibmcloud/ibmcloud.go`:
- Around line 227-231: The TagsAsStringArray function iterates directly over the
tags map, which produces nondeterministic ordering in Go since map iteration
order is randomized. To fix this, extract all the keys from the tags map into a
slice, sort that slice of keys, and then iterate over the sorted keys in the for
loop instead of directly iterating over the map. This ensures the resulting
pulumi.StringArray has consistent ordering across multiple runs, eliminating
noisy diffs in resource previews and updates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 996c0259-0ea3-4fe6-9953-7253897ca76e
📒 Files selected for processing (11)
pkg/integrations/gitlab/glrunner.gopkg/integrations/gitlab/snippet-linux.shpkg/integrations/gitlab/types.gopkg/integrations/integrations.gopkg/integrations/otelcol/snippet-linux.shpkg/provider/ibmcloud/action/ibm-power/cloud-configpkg/provider/ibmcloud/action/ibm-power/ibm-power.gopkg/provider/ibmcloud/action/ibm-z/ibm-z.gopkg/provider/ibmcloud/ibmcloud.gopkg/provider/ibmcloud/modules/network/network.gopkg/provider/ibmcloud/services/power/power.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/integrations/integrations.go
- pkg/integrations/gitlab/types.go
- pkg/integrations/gitlab/glrunner.go
- pkg/integrations/otelcol/snippet-linux.sh
| if grep -q '^dns_options' /etc/containers/containers.conf; then | ||
| sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \ | ||
| /etc/containers/containers.conf | ||
| else | ||
| sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \ | ||
| /etc/containers/containers.conf |
There was a problem hiding this comment.
Scope dns_options edits to the [containers] section.
Line 84 checks dns_options globally, and Line 85 replaces it globally. If another section defines dns_options, [containers].dns_options may remain unset (or unrelated sections get modified), causing inconsistent Podman DNS behavior.
Proposed fix
- # Add or update dns_options within [containers]
- if grep -q '^dns_options' /etc/containers/containers.conf; then
- sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
- /etc/containers/containers.conf
+ # Add or update dns_options within [containers]
+ if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \
+ /etc/containers/containers.conf; then
+ sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \
+ /etc/containers/containers.conf
else
sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
/etc/containers/containers.conf
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if grep -q '^dns_options' /etc/containers/containers.conf; then | |
| sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \ | |
| /etc/containers/containers.conf | |
| else | |
| sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \ | |
| /etc/containers/containers.conf | |
| # Add or update dns_options within [containers] | |
| if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \ | |
| /etc/containers/containers.conf; then | |
| sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \ | |
| /etc/containers/containers.conf | |
| else | |
| sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \ | |
| /etc/containers/containers.conf | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/integrations/gitlab/snippet-linux.sh` around lines 84 - 89, The grep
check on line 84 searches for dns_options globally in the file without scoping
it to the [containers] section, and the subsequent sed replacement on line 85
also operates globally. This can cause unintended modifications to dns_options
in other sections or leave [containers].dns_options unset. Modify the grep
pattern to specifically check for dns_options within the [containers] section
(for example, by using a pattern that matches dns_options after the [containers]
header), and update the sed command on line 85 to scope the replacement to only
the [containers] section to ensure consistent Podman DNS configuration.
| for i := 0; i < 60; i++ { // up to ~10 minutes | ||
| vol, err := vc.Get(volumeId) | ||
| if err == nil { | ||
| switch vol.State { | ||
| case "available", "in-use": | ||
| logging.Infof("volume %s is in state %q, proceeding with attachment", volumeId, vol.State) | ||
| return volumeId, nil | ||
| default: | ||
| logging.Infof("volume %s state: %q, retrying in 10s...", volumeId, vol.State) | ||
| } | ||
| } else { | ||
| logging.Infof("volume %s GET returned error (retrying in 10s): %v", volumeId, err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual code in the file
cat -n pkg/provider/ibmcloud/services/power/power.go | sed -n '79,104p'Repository: redhat-developer/mapt
Length of output: 1114
🏁 Script executed:
# Let's also check the imports and understand what vc (volumeClient) is
head -50 pkg/provider/ibmcloud/services/power/power.goRepository: redhat-developer/mapt
Length of output: 1428
🏁 Script executed:
# Search for volumeClient definition and initialization
rg -n "volumeClient\|vc\.Get\|IBMPIVolumeClient" pkg/provider/ibmcloud/services/power/power.go -A 3 -B 1Repository: redhat-developer/mapt
Length of output: 47
🏁 Script executed:
# Check if there are any error type definitions or status codes being handled
rg -n "error\|Error\|status" pkg/provider/ibmcloud/services/power/power.go | head -30Repository: redhat-developer/mapt
Length of output: 47
🏁 Script executed:
# Let's find the dependency on power-go-client
rg "github.com/IBM-Cloud/power-go-client" go.modRepository: redhat-developer/mapt
Length of output: 111
🏁 Script executed:
# Find the volumeClient function definition
rg -n "func volumeClient" pkg/provider/ibmcloud/services/power/power.go -A 15Repository: redhat-developer/mapt
Length of output: 689
🏁 Script executed:
# Check for any error handling patterns in the file
rg -n "http\|status\|StatusCode\|Type\|errors\." pkg/provider/ibmcloud/services/power/power.go -iRepository: redhat-developer/mapt
Length of output: 47
🏁 Script executed:
# Search for similar retry patterns in other files for context
rg -n "for.*i.*<.*60\|time\.After.*10" pkg/provider/ibmcloud/ -lRepository: redhat-developer/mapt
Length of output: 47
🏁 Script executed:
# Look for error classification/handling in other power-related files
find pkg/provider/ibmcloud -name "*.go" -type f | head -10Repository: redhat-developer/mapt
Length of output: 520
🏁 Script executed:
# Check if there's any existing error classification in the codebase
rg -n "IsRetryable\|Retryable\|Transient\|Permanent" pkg/provider/ibmcloud/ -iRepository: redhat-developer/mapt
Length of output: 47
🌐 Web query:
power-go-client IBMPIVolumeClient error handling retriable
💡 Result:
In IBM-Cloud/power-go-client, errors returned by the lower-level API calls (including those from the IBMPIVolumeClient volume-related methods) are wrapped via ibmpisession.SDKFailWithAPIError when the SDK call returns a non-nil error (example shown in ibm-pi-instance.go: it returns SDKFailWithAPIError(err, …) on err!= nil). [1] What this means for “retriable” handling: 1) power-go-client itself does not expose (in the reviewed code) a per-method “IsRetriable” hook specifically for IBMPIVolumeClient errors; instead, retry behavior is governed by the shared IBM Go SDK HTTP/retry mechanisms (ibm-cloud-sdk-common / ibm/go-sdk-core) and its generic error handling. The project is built on IBM SDK core and supports “automatic retries” as described in IBM’s SDK common documentation. [2] 2) Therefore, whether a particular IBMPIVolumeClient error is retried depends on how the IBM SDK core classifies the underlying HTTP/API error as retryable (e.g., throttling/rate-limit, transient network/5xx patterns), not on any IBMPIVolumeClient-specific “retriable error” implementation. Practical guidance (how to handle retriable errors in your code): - Enable/adjust SDK automatic retries at the IBM SDK core/client level (documented as “Automatic retries” in ibm-cloud-sdk-common). [2] - If you need explicit application-level retry decisions for power-go-client errors, you will need to inspect the error type/content you get from SDKFailWithAPIError (e.g., underlying HTTP status / error payload) and implement your own retry policy in your caller. If you can share the exact IBMPIVolumeClient method name and the error you’re seeing (HTTP status code, message), I can pinpoint whether that specific class of failure is typically treated as retryable by the IBM SDK core layer and give a concrete snippet for retry decision logic.
Citations:
- 1: https://github.com/IBM-Cloud/power-go-client/blob/master/clients/instance/ibm-pi-instance.go
- 2: https://github.com/IBM/ibm-cloud-sdk-common/blob/master/README.md
🌐 Web query:
IBM Cloud Power API error codes status codes
💡 Result:
IBM Cloud Power (PowerVS/Power Cloud) API “error codes / status codes” are primarily the standard HTTP status codes returned by the API, plus a service-specific error payload (for Power Cloud endpoints you’ll commonly see values like pcloudPvminstancesActionPostBadRequest with a description/message). IBM’s API handbook states that HTTP responses use standard 2xx/4xx/5xx categories, and when a 4xx/5xx status code is returned the response body uses an error container model. [1][2][3] 1) HTTP status codes (what you should key off of) - 4xx client errors: these indicate the request is invalid or cannot be fulfilled due to client-side issues; IBM documents the general meaning of 4xx codes and notes common ones like 404/405/409/etc. [1] - 5xx server errors: these indicate server-side failure; IBM documents general meaning of 5xx codes like 500/504/505/etc. [1] - IBM specifically says that if the service returns a 400-series or 500-series status, the response body must be an error container model. [2][3] 2) Service-specific PowerVS/Power Cloud error codes you’ll see in practice (examples) In PowerVS API calls under the /pcloud/... namespace, common failures surface as a combination of HTTP status + an error “title”/reason string (e.g., pcloudPvminstancesActionPostBadRequest) and a description message. - 400 Bad Request (example: action not allowed due to current state) - Starting a Power Virtual Server instance that is already in ACTIVE state can return HTTP 400 with a message like “is already in an ACTIVE state and cannot be started.” [4] - 400 Bad Request (example: invalid/required parameter missing) - Creating a PVM instance can return HTTP 400 where the error description mentions that a calculated parameter failed (for example, “storageType is required when not using a storage affinity policy”). [5] - 400 Bad Request (example: operation not supported in environment) - Updating a PVM instance can return HTTP 400 with “cloudInitialisation is not supported in this environment.” [6] - 409 Conflict (example: cannot attach volume when instance health/status not OK) - Volume attachment can return HTTP 409 conflict, with a message indicating the volume can’t be attached due to the instance health not being “OK” (e.g., health=WARNING). [7] - 409 Conflict (example: attach fails due to instance/VM state constraints) - The Terraform-provider issue for attachment shows an HTTP 409 conflict response for volume attachment to a PVM instance. [7] - 400 Bad Request (example: clone cancel only allowed in certain states) - Cancelling a volume clone request that is already in “completed” state returns HTTP 400, with a message indicating it must be in “prepared or available” to cancel. [8] 3) How to interpret “status_code” vs “error code” - IBM’s API handbook distinguishes HTTP status codes (what the HTTP layer returns) from the structured error payload that accompanies 4xx/5xx responses. The error container model includes an errors field with objects that include a code and message, and it may include status_code and trace. [2][9] - For Power Cloud/PowerVS, the examples you’ll encounter in client output show both: a) The HTTP status (400/409, etc.) b) A service-generated error key/name and a human-readable Description/Message. [8][4][5][7] Practical takeaway - If you’re mapping “Power API error codes” to a decision system, start with the HTTP status code (400 vs 409 vs 5xx) as the primary discriminator, then inspect the Power Cloud error payload fields (code/message/description) for the specific root cause (e.g., “already in ACTIVE”, “cannot cancel because state is completed”, “cloudInitialisation not supported”, “cannot attach because health not OK”). [1][2][4][8][7]
Citations:
- 1: https://cloud.ibm.com/docs/api-handbook?topic=api-handbook-status-codes
- 2: https://cloud.ibm.com/docs/api-handbook?topic=api-handbook-errors
- 3: https://github.com/ibm-cloud-docs/api-handbook/blob/086d28c9357b39612ceb816130d0f6ad92f82859/design/errors.md
- 4: PowerVS start and possibly stop actions failing when resource already in the desired state IBM-Cloud/terraform-provider-ibm#4400
- 5: PowerVS terraform deployment failing IBM-Cloud/terraform-provider-ibm#2787
- 6: High Priority: ibm_pi_instance fails to create instance. IBM-Cloud/terraform-provider-ibm#4905
- 7: Volume attachment to PVS Instances not working IBM-Cloud/terraform-provider-ibm#2810
- 8: https://community.ibm.com/community/user/blogs/amritanshu-verma/2026/06/11/ibm-power-virtual-server-volume-clone-lifecycle
- 9: https://cloud.ibm.com/docs/vpc?topic=vpc-instance-metadata-error-codes
🏁 Script executed:
# Check if there are examples or tests showing how errors are handled
rg -n "Get\|error" pkg/provider/ibmcloud/action/ibm-power/ -A 2 -B 1 | head -40Repository: redhat-developer/mapt
Length of output: 47
🏁 Script executed:
# Look for any existing patterns of error classification in the codebase
rg -n "switch.*err\|type.*error" pkg/provider/ibmcloud/ -lRepository: redhat-developer/mapt
Length of output: 47
🏁 Script executed:
# Check go.mod for error handling related dependencies
cat go.mod | grep -i "error\|retry\|backoff"Repository: redhat-developer/mapt
Length of output: 321
Fail fast on non-retriable volume lookup failures.
Lines 84-96 currently retry all vc.Get(volumeId) errors for ~10 minutes. Permanent failures (e.g., invalid volume ID, permission errors, volume not found) should return immediately; otherwise this adds avoidable timeout delays and masks root causes during incident triage.
Classify errors by HTTP status code and retry only transient ones (5xx, 429), while failing fast on non-retriable 4xx responses. The underlying power-go-client error can be inspected for the HTTP status to make this determination.
Also applies to: 97-104
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/provider/ibmcloud/services/power/power.go` around lines 84 - 96, The loop
that calls vc.Get(volumeId) currently retries on all errors for the full
duration, but should fail fast on permanent failures. Modify the error handling
block (the else clause where err is not nil) to inspect the underlying
power-go-client error for its HTTP status code. If the error is a non-retriable
4xx response, return the error immediately to fail fast. If the error is
transient (5xx or 429), continue with the existing retry logic that sleeps and
retries. This same fix should be applied to the similar error handling pattern
in the section that also handles vc.Get errors (lines 97-104 and beyond).
- fix: preserve path component in COS backend URL - fix: add timeout to multipath device wait in PowerVS cloud-config - feat: propagate --tags CLI flag to all ppc64 and s390x resources - refactor: restructure executor container setup to fix GitLab runner DNS Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fix: use project name as runner name instead of hostname - feat: correlate CI job logs with OTel traces via journald; when both GitLab runner and OTel are configured, Podman uses the journald log driver so CI job output is captured by systemd journal; otelcol gains a journald/gitlab-jobs receiver that parses job_id, project_id, and runner_token for cross-stream correlation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves DNS failures in nested container builds inside GitLab CI executor containers on ppc64le, s390x, Ubuntu, and RHEL9: - detect real upstream DNS servers via resolvectl/nmcli/resolv.conf and write them to containers.conf dns_servers so inner Podman containers get working nameservers (resolv.conf stubs like 127.0.0.53 are unreachable from inside nested containers) - replace netns=host workaround with ip_forward=1 so Netavark bridge NAT works correctly, giving each inner container an isolated namespace while still routing egress through the executor container's interface - persist ip_forward sysctl and firewalld masquerade rules across reboots - use non-conflicting Netavark subnet (10.88.x) to avoid IP space conflicts - use a separate executor containers.conf to fix nested DNS on ppc64/RHEL9 - fix Ubuntu executor DNS by passing --docker-dns at runner registration - increase runner output_limit to 64 MB to prevent log truncation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a backed URL includes a path (e.g. s3://bucket/some/path), the previous code discarded the path and stored Pulumi state at the bucket root. Refactor extractBucket into extractBucketAndPath so the path is carried through to PULUMI_BACKEND_URL.
Fixes #829
Fixes #832