Skip to content

fix(ibmcloud): preserve path component in COS backend URL#830

Open
adrianriobo wants to merge 3 commits into
redhat-developer:mainfrom
adrianriobo:fix-829
Open

fix(ibmcloud): preserve path component in COS backend URL#830
adrianriobo wants to merge 3 commits into
redhat-developer:mainfrom
adrianriobo:fix-829

Conversation

@adrianriobo

@adrianriobo adrianriobo commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

When a backed URL includes a path (e.g. s3://bucket/some/path), the previous code discarded the path and stored Pulumi state at the bucket root. Refactor extractBucket into extractBucketAndPath so the path is carried through to PULUMI_BACKEND_URL.

Fixes #829
Fixes #832

@coderabbitai

This comment was marked as outdated.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 35-36: The /etc/resolv.conf fallback currently excludes only IPv4
loopback (127.*) so IPv6 loopback (::1) can still be captured into _dns_servers;
update the awk filter in the _dns_servers assignment (the line using awk
'/^nameserver/ && $2 !~ /^127\./ {print $2}') to also exclude the IPv6 loopback
(e.g. add a condition like $2 != "::1" or $2 !~ /^::1/), ensuring nameserver
entries equal to ::1 are not included.
- Around line 48-54: The script currently checks for "dns_servers" globally
which can match other sections; change the detection/update logic so it only
looks inside the [containers] section: when checking for existence use a
section-scoped search (e.g., search between the line matching "^\[containers\]"
and the next section header "^\[.*\]") to decide whether to replace dns_servers
or append it after the [containers] header; update the two sed/grep branches
that reference dns_servers and /etc/containers/containers.conf and ensure the
inserted value uses the same ${_toml_list} variable and formatting as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 466885cf-1897-4a3c-8cf6-9db3d3fcc8aa

📥 Commits

Reviewing files that changed from the base of the PR and between 2a1f1fc and c0d0b15.

📒 Files selected for processing (1)
  • pkg/integrations/gitlab/snippet-linux.sh

Comment thread pkg/integrations/gitlab/snippet-linux.sh Outdated
Comment thread pkg/integrations/gitlab/snippet-linux.sh
coderabbitai[bot]

This comment was marked as resolved.

@deekay2310 deekay2310 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from my one comment regarding context.go, we should also address CodeRabbit's comments and I think this PR will be ready to merge.


func manageIntegration(c *Context, ca *ContextArgs) error {
if ca.GHRunnerArgs != nil {
ca.GHRunnerArgs.Name = c.RunID()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be changed to c.ProjectName() just like GLRunnerArgs.name?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I just found out using the RunID as name is not really helpful so make it with the project name clarifies it, but as this change is GH related can you change it on your PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll look into it

@adrianriobo adrianriobo force-pushed the fix-829 branch 4 times, most recently from 16330a5 to 08f1b26 Compare June 18, 2026 19:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/provider/ibmcloud/ibmcloud.go (1)

227-231: ⚡ Quick win

Stabilize tag serialization order to avoid nondeterministic resource diffs.

At Line 229, iterating a Go map directly makes pulumi.StringArray order nondeterministic across runs. Since this helper feeds many IBM Cloud resources, this can cause noisy previews/updates if ordering is treated as significant. Sort keys before append.

Proposed fix
+import "sort"
...
 func TagsAsStringArray(tags map[string]string) pulumi.StringArray {
 	result := make(pulumi.StringArray, 0, len(tags))
-	for k, v := range tags {
+	keys := make([]string, 0, len(tags))
+	for k := range tags {
+		keys = append(keys, k)
+	}
+	sort.Strings(keys)
+	for _, k := range keys {
+		v := tags[k]
 		result = append(result, pulumi.String(fmt.Sprintf("%s:%s", k, v)))
 	}
 	return result
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/ibmcloud/ibmcloud.go` around lines 227 - 231, The
TagsAsStringArray function iterates directly over the tags map, which produces
nondeterministic ordering in Go since map iteration order is randomized. To fix
this, extract all the keys from the tags map into a slice, sort that slice of
keys, and then iterate over the sorted keys in the for loop instead of directly
iterating over the map. This ensures the resulting pulumi.StringArray has
consistent ordering across multiple runs, eliminating noisy diffs in resource
previews and updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/integrations/gitlab/snippet-linux.sh`:
- Around line 84-89: The grep check on line 84 searches for dns_options globally
in the file without scoping it to the [containers] section, and the subsequent
sed replacement on line 85 also operates globally. This can cause unintended
modifications to dns_options in other sections or leave [containers].dns_options
unset. Modify the grep pattern to specifically check for dns_options within the
[containers] section (for example, by using a pattern that matches dns_options
after the [containers] header), and update the sed command on line 85 to scope
the replacement to only the [containers] section to ensure consistent Podman DNS
configuration.

In `@pkg/provider/ibmcloud/services/power/power.go`:
- Around line 84-96: The loop that calls vc.Get(volumeId) currently retries on
all errors for the full duration, but should fail fast on permanent failures.
Modify the error handling block (the else clause where err is not nil) to
inspect the underlying power-go-client error for its HTTP status code. If the
error is a non-retriable 4xx response, return the error immediately to fail
fast. If the error is transient (5xx or 429), continue with the existing retry
logic that sleeps and retries. This same fix should be applied to the similar
error handling pattern in the section that also handles vc.Get errors (lines
97-104 and beyond).

---

Nitpick comments:
In `@pkg/provider/ibmcloud/ibmcloud.go`:
- Around line 227-231: The TagsAsStringArray function iterates directly over the
tags map, which produces nondeterministic ordering in Go since map iteration
order is randomized. To fix this, extract all the keys from the tags map into a
slice, sort that slice of keys, and then iterate over the sorted keys in the for
loop instead of directly iterating over the map. This ensures the resulting
pulumi.StringArray has consistent ordering across multiple runs, eliminating
noisy diffs in resource previews and updates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 996c0259-0ea3-4fe6-9953-7253897ca76e

📥 Commits

Reviewing files that changed from the base of the PR and between b30e0ee and c22d461.

📒 Files selected for processing (11)
  • pkg/integrations/gitlab/glrunner.go
  • pkg/integrations/gitlab/snippet-linux.sh
  • pkg/integrations/gitlab/types.go
  • pkg/integrations/integrations.go
  • pkg/integrations/otelcol/snippet-linux.sh
  • pkg/provider/ibmcloud/action/ibm-power/cloud-config
  • pkg/provider/ibmcloud/action/ibm-power/ibm-power.go
  • pkg/provider/ibmcloud/action/ibm-z/ibm-z.go
  • pkg/provider/ibmcloud/ibmcloud.go
  • pkg/provider/ibmcloud/modules/network/network.go
  • pkg/provider/ibmcloud/services/power/power.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/integrations/integrations.go
  • pkg/integrations/gitlab/types.go
  • pkg/integrations/gitlab/glrunner.go
  • pkg/integrations/otelcol/snippet-linux.sh

Comment on lines +84 to +89
if grep -q '^dns_options' /etc/containers/containers.conf; then
sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
/etc/containers/containers.conf
else
sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
/etc/containers/containers.conf

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope dns_options edits to the [containers] section.

Line 84 checks dns_options globally, and Line 85 replaces it globally. If another section defines dns_options, [containers].dns_options may remain unset (or unrelated sections get modified), causing inconsistent Podman DNS behavior.

Proposed fix
-        # Add or update dns_options within [containers]
-        if grep -q '^dns_options' /etc/containers/containers.conf; then
-            sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
-                /etc/containers/containers.conf
+        # Add or update dns_options within [containers]
+        if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \
+                /etc/containers/containers.conf; then
+            sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \
+                /etc/containers/containers.conf
         else
             sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
                 /etc/containers/containers.conf
         fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if grep -q '^dns_options' /etc/containers/containers.conf; then
sudo sed -i 's|^dns_options.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|' \
/etc/containers/containers.conf
else
sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
/etc/containers/containers.conf
# Add or update dns_options within [containers]
if awk '/^\[containers\]/{f=1;next} /^\[/{f=0} f && /^[[:space:]]*dns_options[[:space:]]*=/{found=1} END{exit !found}' \
/etc/containers/containers.conf; then
sudo sed -i '/^\[containers\]/,/^\[/{s|^[[:space:]]*dns_options[[:space:]]*=.*|dns_options = ["timeout:2", "attempts:5", "single-request"]|}' \
/etc/containers/containers.conf
else
sudo sed -i '/^\[containers\]/a dns_options = ["timeout:2", "attempts:5", "single-request"]' \
/etc/containers/containers.conf
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/integrations/gitlab/snippet-linux.sh` around lines 84 - 89, The grep
check on line 84 searches for dns_options globally in the file without scoping
it to the [containers] section, and the subsequent sed replacement on line 85
also operates globally. This can cause unintended modifications to dns_options
in other sections or leave [containers].dns_options unset. Modify the grep
pattern to specifically check for dns_options within the [containers] section
(for example, by using a pattern that matches dns_options after the [containers]
header), and update the sed command on line 85 to scope the replacement to only
the [containers] section to ensure consistent Podman DNS configuration.

Comment on lines +84 to +96
for i := 0; i < 60; i++ { // up to ~10 minutes
vol, err := vc.Get(volumeId)
if err == nil {
switch vol.State {
case "available", "in-use":
logging.Infof("volume %s is in state %q, proceeding with attachment", volumeId, vol.State)
return volumeId, nil
default:
logging.Infof("volume %s state: %q, retrying in 10s...", volumeId, vol.State)
}
} else {
logging.Infof("volume %s GET returned error (retrying in 10s): %v", volumeId, err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code in the file
cat -n pkg/provider/ibmcloud/services/power/power.go | sed -n '79,104p'

Repository: redhat-developer/mapt

Length of output: 1114


🏁 Script executed:

# Let's also check the imports and understand what vc (volumeClient) is
head -50 pkg/provider/ibmcloud/services/power/power.go

Repository: redhat-developer/mapt

Length of output: 1428


🏁 Script executed:

# Search for volumeClient definition and initialization
rg -n "volumeClient\|vc\.Get\|IBMPIVolumeClient" pkg/provider/ibmcloud/services/power/power.go -A 3 -B 1

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Check if there are any error type definitions or status codes being handled
rg -n "error\|Error\|status" pkg/provider/ibmcloud/services/power/power.go | head -30

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Let's find the dependency on power-go-client
rg "github.com/IBM-Cloud/power-go-client" go.mod

Repository: redhat-developer/mapt

Length of output: 111


🏁 Script executed:

# Find the volumeClient function definition
rg -n "func volumeClient" pkg/provider/ibmcloud/services/power/power.go -A 15

Repository: redhat-developer/mapt

Length of output: 689


🏁 Script executed:

# Check for any error handling patterns in the file
rg -n "http\|status\|StatusCode\|Type\|errors\." pkg/provider/ibmcloud/services/power/power.go -i

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Search for similar retry patterns in other files for context
rg -n "for.*i.*<.*60\|time\.After.*10" pkg/provider/ibmcloud/ -l

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Look for error classification/handling in other power-related files
find pkg/provider/ibmcloud -name "*.go" -type f | head -10

Repository: redhat-developer/mapt

Length of output: 520


🏁 Script executed:

# Check if there's any existing error classification in the codebase
rg -n "IsRetryable\|Retryable\|Transient\|Permanent" pkg/provider/ibmcloud/ -i

Repository: redhat-developer/mapt

Length of output: 47


🌐 Web query:

power-go-client IBMPIVolumeClient error handling retriable

💡 Result:

In IBM-Cloud/power-go-client, errors returned by the lower-level API calls (including those from the IBMPIVolumeClient volume-related methods) are wrapped via ibmpisession.SDKFailWithAPIError when the SDK call returns a non-nil error (example shown in ibm-pi-instance.go: it returns SDKFailWithAPIError(err, …) on err!= nil). [1] What this means for “retriable” handling: 1) power-go-client itself does not expose (in the reviewed code) a per-method “IsRetriable” hook specifically for IBMPIVolumeClient errors; instead, retry behavior is governed by the shared IBM Go SDK HTTP/retry mechanisms (ibm-cloud-sdk-common / ibm/go-sdk-core) and its generic error handling. The project is built on IBM SDK core and supports “automatic retries” as described in IBM’s SDK common documentation. [2] 2) Therefore, whether a particular IBMPIVolumeClient error is retried depends on how the IBM SDK core classifies the underlying HTTP/API error as retryable (e.g., throttling/rate-limit, transient network/5xx patterns), not on any IBMPIVolumeClient-specific “retriable error” implementation. Practical guidance (how to handle retriable errors in your code): - Enable/adjust SDK automatic retries at the IBM SDK core/client level (documented as “Automatic retries” in ibm-cloud-sdk-common). [2] - If you need explicit application-level retry decisions for power-go-client errors, you will need to inspect the error type/content you get from SDKFailWithAPIError (e.g., underlying HTTP status / error payload) and implement your own retry policy in your caller. If you can share the exact IBMPIVolumeClient method name and the error you’re seeing (HTTP status code, message), I can pinpoint whether that specific class of failure is typically treated as retryable by the IBM SDK core layer and give a concrete snippet for retry decision logic.

Citations:


🌐 Web query:

IBM Cloud Power API error codes status codes

💡 Result:

IBM Cloud Power (PowerVS/Power Cloud) API “error codes / status codes” are primarily the standard HTTP status codes returned by the API, plus a service-specific error payload (for Power Cloud endpoints you’ll commonly see values like pcloudPvminstancesActionPostBadRequest with a description/message). IBM’s API handbook states that HTTP responses use standard 2xx/4xx/5xx categories, and when a 4xx/5xx status code is returned the response body uses an error container model. [1][2][3] 1) HTTP status codes (what you should key off of) - 4xx client errors: these indicate the request is invalid or cannot be fulfilled due to client-side issues; IBM documents the general meaning of 4xx codes and notes common ones like 404/405/409/etc. [1] - 5xx server errors: these indicate server-side failure; IBM documents general meaning of 5xx codes like 500/504/505/etc. [1] - IBM specifically says that if the service returns a 400-series or 500-series status, the response body must be an error container model. [2][3] 2) Service-specific PowerVS/Power Cloud error codes you’ll see in practice (examples) In PowerVS API calls under the /pcloud/... namespace, common failures surface as a combination of HTTP status + an error “title”/reason string (e.g., pcloudPvminstancesActionPostBadRequest) and a description message. - 400 Bad Request (example: action not allowed due to current state) - Starting a Power Virtual Server instance that is already in ACTIVE state can return HTTP 400 with a message like “is already in an ACTIVE state and cannot be started.” [4] - 400 Bad Request (example: invalid/required parameter missing) - Creating a PVM instance can return HTTP 400 where the error description mentions that a calculated parameter failed (for example, “storageType is required when not using a storage affinity policy”). [5] - 400 Bad Request (example: operation not supported in environment) - Updating a PVM instance can return HTTP 400 with “cloudInitialisation is not supported in this environment.” [6] - 409 Conflict (example: cannot attach volume when instance health/status not OK) - Volume attachment can return HTTP 409 conflict, with a message indicating the volume can’t be attached due to the instance health not being “OK” (e.g., health=WARNING). [7] - 409 Conflict (example: attach fails due to instance/VM state constraints) - The Terraform-provider issue for attachment shows an HTTP 409 conflict response for volume attachment to a PVM instance. [7] - 400 Bad Request (example: clone cancel only allowed in certain states) - Cancelling a volume clone request that is already in “completed” state returns HTTP 400, with a message indicating it must be in “prepared or available” to cancel. [8] 3) How to interpret “status_code” vs “error code” - IBM’s API handbook distinguishes HTTP status codes (what the HTTP layer returns) from the structured error payload that accompanies 4xx/5xx responses. The error container model includes an errors field with objects that include a code and message, and it may include status_code and trace. [2][9] - For Power Cloud/PowerVS, the examples you’ll encounter in client output show both: a) The HTTP status (400/409, etc.) b) A service-generated error key/name and a human-readable Description/Message. [8][4][5][7] Practical takeaway - If you’re mapping “Power API error codes” to a decision system, start with the HTTP status code (400 vs 409 vs 5xx) as the primary discriminator, then inspect the Power Cloud error payload fields (code/message/description) for the specific root cause (e.g., “already in ACTIVE”, “cannot cancel because state is completed”, “cloudInitialisation not supported”, “cannot attach because health not OK”). [1][2][4][8][7]

Citations:


🏁 Script executed:

# Check if there are examples or tests showing how errors are handled
rg -n "Get\|error" pkg/provider/ibmcloud/action/ibm-power/ -A 2 -B 1 | head -40

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Look for any existing patterns of error classification in the codebase
rg -n "switch.*err\|type.*error" pkg/provider/ibmcloud/ -l

Repository: redhat-developer/mapt

Length of output: 47


🏁 Script executed:

# Check go.mod for error handling related dependencies
cat go.mod | grep -i "error\|retry\|backoff"

Repository: redhat-developer/mapt

Length of output: 321


Fail fast on non-retriable volume lookup failures.

Lines 84-96 currently retry all vc.Get(volumeId) errors for ~10 minutes. Permanent failures (e.g., invalid volume ID, permission errors, volume not found) should return immediately; otherwise this adds avoidable timeout delays and masks root causes during incident triage.

Classify errors by HTTP status code and retry only transient ones (5xx, 429), while failing fast on non-retriable 4xx responses. The underlying power-go-client error can be inspected for the HTTP status to make this determination.

Also applies to: 97-104

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/ibmcloud/services/power/power.go` around lines 84 - 96, The loop
that calls vc.Get(volumeId) currently retries on all errors for the full
duration, but should fail fast on permanent failures. Modify the error handling
block (the else clause where err is not nil) to inspect the underlying
power-go-client error for its HTTP status code. If the error is a non-retriable
4xx response, return the error immediately to fail fast. If the error is
transient (5xx or 429), continue with the existing retry logic that sleeps and
retries. This same fix should be applied to the similar error handling pattern
in the section that also handles vc.Get errors (lines 97-104 and beyond).

adrianriobo and others added 3 commits June 23, 2026 12:20
- fix: preserve path component in COS backend URL
- fix: add timeout to multipath device wait in PowerVS cloud-config
- feat: propagate --tags CLI flag to all ppc64 and s390x resources
- refactor: restructure executor container setup to fix GitLab runner DNS

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- fix: use project name as runner name instead of hostname
- feat: correlate CI job logs with OTel traces via journald; when both
  GitLab runner and OTel are configured, Podman uses the journald log
  driver so CI job output is captured by systemd journal; otelcol gains
  a journald/gitlab-jobs receiver that parses job_id, project_id, and
  runner_token for cross-stream correlation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves DNS failures in nested container builds inside GitLab CI executor
containers on ppc64le, s390x, Ubuntu, and RHEL9:

- detect real upstream DNS servers via resolvectl/nmcli/resolv.conf and
  write them to containers.conf dns_servers so inner Podman containers
  get working nameservers (resolv.conf stubs like 127.0.0.53 are
  unreachable from inside nested containers)
- replace netns=host workaround with ip_forward=1 so Netavark bridge NAT
  works correctly, giving each inner container an isolated namespace while
  still routing egress through the executor container's interface
- persist ip_forward sysctl and firewalld masquerade rules across reboots
- use non-conflicting Netavark subnet (10.88.x) to avoid IP space conflicts
- use a separate executor containers.conf to fix nested DNS on ppc64/RHEL9
- fix Ubuntu executor DNS by passing --docker-dns at runner registration
- increase runner output_limit to 64 MB to prevent log truncation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants