Skip to content

VER-105 Add generic v1 Harbor Dockerfile environment setup support#1407

Closed
xeophon wants to merge 1 commit into
codex/remove-v1-config-classesfrom
feat/ver-105-harbor-dockerfile-support
Closed

VER-105 Add generic v1 Harbor Dockerfile environment setup support#1407
xeophon wants to merge 1 commit into
codex/remove-v1-config-classesfrom
feat/ver-105-harbor-dockerfile-support

Conversation

@xeophon
Copy link
Copy Markdown
Member

@xeophon xeophon commented May 18, 2026

Summary

  • add v1 Harbor Dockerfile replay for tasks with environment/Dockerfile and no explicit environment.docker_image
  • upload the Harbor environment/ build context and translate FROM, WORKDIR, ENV, RUN, and COPY/ADD into sandbox/program setup
  • export the small parser helper from the v1 taskset package

Stacked on #1392.

Testing

  • uv run pytest tests/test_v1_harbor_cli.py -q
  • uv run pytest tests/test_v1_mini_swe_agent.py -q
  • uv run ruff check --fix verifiers/v1/packages/tasksets/harbor.py tests/test_v1_harbor_cli.py verifiers/v1/packages/tasksets/__init__.py verifiers/v1/__init__.py
  • uv run ruff format --check verifiers/v1/packages/tasksets/harbor.py tests/test_v1_harbor_cli.py verifiers/v1/packages/tasksets/__init__.py verifiers/v1/__init__.py
  • uv run ty check verifiers/v1/packages/tasksets/harbor.py
  • OpenThoughts row smoke for broken-python, jq-data-processing, log-summary

Note

Medium Risk
Adds new Dockerfile parsing and command-generation that affects how Harbor tasks provision their sandbox (image/workdir/env/setup), which could break task execution if parsing or path handling is wrong.

Overview
Harbor v1 tasks can now derive their runtime environment from an environment/Dockerfile when [environment].docker_image is not set. HarborTaskset.task_row parses the Dockerfile to set sandbox.image/sandbox.workdir, uploads the environment/ directory as a build context (HARBOR_BUILD_CONTEXT), and injects translated ENV/RUN/COPY/ADD steps into program.env and program.setup.

Introduces a public parse_harbor_dockerfile helper (re-exported via verifiers.v1 / tasksets) with guardrails like rejecting multi-stage builds, and adds focused tests covering Dockerfile replay behavior and edge cases.

Reviewed by Cursor Bugbot for commit 3d73cc3. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add parse_harbor_dockerfile to replay Dockerfile environment setup in Harbor tasks

  • Adds parse_harbor_dockerfile in harbor.py that converts a single-stage Dockerfile into a replayable config: base image, workdir, ENV variables, and shell setup steps (RUN, COPY, ADD, WORKDIR).
  • Updates HarborTaskset.task_row to detect an environment/Dockerfile when no docker_image is configured and apply the parsed image, workdir, env, and setup steps to the sandbox and program.
  • COPY/ADD instructions are resolved relative to HARBOR_BUILD_CONTEXT (/tmp/harbor_environment), with program.dirs mapping the build context to the environment directory.
  • Multi-stage Dockerfiles (more than one FROM) are rejected with a ValueError.
  • Exports parse_harbor_dockerfile from verifiers.v1 and verifiers.v1.packages.tasksets.

Macroscope summarized 3d73cc3.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 585948ccca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +287 to +289
setup.append(
f"mkdir -p {shlex.quote(mkdir_path)} && "
f"cp -R {shlex.quote(source_path)} {shlex.quote(target)}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Copy directory contents instead of nesting the build context

For Harbor tasks that rely on environment/Dockerfile without environment.docker_image, common Dockerfile patterns like WORKDIR /app followed by COPY . . are replayed as cp -R /tmp/harbor_environment /app. Because the WORKDIR setup already created /app, this nests the whole context under /app/harbor_environment instead of copying its contents into /app, so later RUN commands and tests cannot find files at the Dockerfile paths.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: COPY sources that are the build context, end with a slash, or resolve to a local directory now copy contents via source/. so COPY . . no longer nests /tmp/harbor_environment under the workdir. Added a regression test for COPY . .

Comment on lines +244 to +249
if kind == "FROM":
image = next(
token
for token in value.split()
if not token.startswith("--") and token.upper() != "AS"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject or isolate multi-stage Dockerfile stages

When a Dockerfile has multiple FROM stages, this keeps all earlier RUN/WORKDIR/COPY setup commands while replacing image with the last stage. In a valid multi-stage file, builder-stage commands may require the builder image and COPY --from artifacts, but the replay will run those commands in the final image and skip the cross-stage copy, causing otherwise valid Harbor Dockerfile tasks to fail or start without required files.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: multi-stage Dockerfiles now fail during HarborTaskset Dockerfile replay with a clear ValueError telling users to provide a prebuilt [environment].docker_image instead. Added a regression test for this path.

Comment thread verifiers/v1/packages/tasksets/harbor.py Outdated
Comment thread verifiers/v1/packages/tasksets/harbor.py
Comment thread verifiers/v1/packages/tasksets/harbor.py
@xeophon xeophon force-pushed the feat/ver-105-harbor-dockerfile-support branch from 585948c to 714e022 Compare May 18, 2026 11:32
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 18, 2026

Approvability

Verdict: Needs human review

Unable to check for correctness in 3d73cc3. This PR introduces new Dockerfile parsing functionality with multiple unresolved review comments identifying potential bugs in the parsing logic (directory copying behavior, ENV expansion, line continuation handling). These substantive issues require human review.

You can customize Macroscope's approvability policy. Learn more.

@xeophon xeophon force-pushed the feat/ver-105-harbor-dockerfile-support branch from 714e022 to 3d73cc3 Compare May 18, 2026 11:35
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d73cc35ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +267 to +268
key, _, val = token.partition("=")
env[key] = val
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand Docker ENV values before replaying setup

When a Dockerfile uses variable references in ENV (for example the common ENV PATH="/root/.local/bin:$PATH"), this stores the literal $PATH in program.env. The setup runner passes that value unchanged as the environment for later RUN commands, so PATH no longer contains the base image paths and commands like python/apt-get can disappear even though the Docker build would expand the value successfully.

Useful? React with 👍 / 👎.

Comment on lines +275 to +276
if any(str(token).startswith("--from") for token in tokens):
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject external COPY --from instead of dropping it

For single-stage Dockerfiles that use Docker's supported external-image form such as COPY --from=nginx:alpine /etc/nginx/nginx.conf /tmp/, this branch silently skips the copy because there is no second FROM to trigger the multi-stage error. The replay then continues with required files missing, so the task fails later with confusing setup/test errors; unsupported --from copies should be handled or rejected explicitly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3d73cc3. Configure here.

continue
if line.endswith("\\"):
pending += line[:-1].rstrip() + " "
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line continuation inserts space, corrupting split tokens

Low Severity

The continuation-line handler unconditionally appends a space (pending += line[:-1].rstrip() + " ") when joining backslash-continued lines. Docker joins continuation lines without inserting whitespace. For instructions like RUN or COPY where arguments are already space-delimited this is harmless, but for a FROM image name or an ENV value split across lines the space corrupts the token — e.g. FROM python:3.11-\ / slim-bookworm becomes two tokens python:3.11- and slim-bookworm, so the parsed image is truncated to python:3.11-.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3d73cc3. Configure here.

Comment thread verifiers/v1/__init__.py
from .packages.tasksets import (
HarborTaskset,
HarborTasksetConfig,
parse_harbor_dockerfile,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New public API and Dockerfile replay feature lack documentation

Low Severity

parse_harbor_dockerfile is exported as a new public symbol in verifiers.v1.__all__, and HarborTaskset.task_row now auto-detects environment/Dockerfile to configure the sandbox image, workdir, env, and setup commands. HarborTaskset is described in docs/byo-harness.md and docs/reference.md, but neither the new Dockerfile replay behavior nor the parse_harbor_dockerfile helper is mentioned anywhere in docs/. The project rules require documentation updates when core user-facing functionality described in docs changes.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 3d73cc3. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant