perf(docker): improve pd/store/server image build cache efficiency#3025
perf(docker): improve pd/store/server image build cache efficiency#3025bitflicker64 wants to merge 2 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## master #3025 +/- ##
============================================
- Coverage 35.85% 1.56% -34.30%
+ Complexity 338 43 -295
============================================
Files 802 780 -22
Lines 67995 65480 -2515
Branches 8902 8454 -448
============================================
- Hits 24381 1026 -23355
- Misses 41008 64368 +23360
+ Partials 2606 86 -2520 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR aims to improve Docker build cache efficiency for HugeGraph PD, Store, and Server images by adding Docker context filtering and BuildKit/Maven cache-related Dockerfile changes.
Changes:
- Adds a new root
.dockerignore. - Updates four Dockerfiles to use a pom-only copy layer and Maven cache mounts.
- Removes some runtime packages and cleanup/startup commands from image build stages.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
.dockerignore |
Adds Docker build-context exclusions. |
hugegraph-pd/Dockerfile |
Refactors PD image build layers and runtime dependency install. |
hugegraph-store/Dockerfile |
Refactors Store image build layers and runtime dependency install. |
hugegraph-server/Dockerfile |
Refactors Server image build layers and runtime dependency install. |
hugegraph-server/Dockerfile-hstore |
Refactors hstore Server image build layers and runtime dependency install. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Copy pom files first — layer is only invalidated when pom files change, | ||
| # keeping the Maven cache mount warm across source-only changes. | ||
| COPY --parents **/pom.xml ./ | ||
| ARG MAVEN_ARGS | ||
|
|
- Add .dockerignore to reduce build context noise (defer to .gitignore, add **/target/ and Docker-specific extras explicitly) - Refactor all 4 Dockerfiles: pom-first COPY layer + BuildKit .m2 cache mount - Add # syntax=docker/dockerfile:1 for compatibility with older Docker versions - Restore cron in runtime stage to preserve optional monitor support - Remove dead code: vim, service cron start, rm libc-bin hack, set -x, pwd&&cd - Update server-ci.yml ubuntu-22.04 -> ubuntu-latest (resolves existing TODO)
imbajin
left a comment
There was a problem hiding this comment.
Direction is right but a few blockers remain. See inline for net-new findings; +1'd Copilot's threads on the labs syntax / pom-only layer / root-pom glob (those are correct). Body-level: README/BUILDING should also note the new BuildKit ≥ 0.20 requirement, and removing vim is a behavioral change worth calling out in the commit body for ops folks who exec into containers.
| # NOTE: This file intentionally stays minimal. | ||
| **/target/ | ||
| # Most patterns (IDE files, build artifacts, logs, OS files) are already | ||
| # covered by .gitignore. Only Docker-specific extras are listed here. |
There was a problem hiding this comment.
.gitignore. As a result, locally-extracted release dirs (apache-hugegraph-*/, measured at ~790 MB on a normal dev checkout), .idea/, node_modules/, logs/, *.iml, *.class, gen-java/, upload-files/, .vscode/, *.tar.gz, dist/, build/ all leak into every build's context, defeating the cache work in this PR (and risking IDE / secret leakage into image layers).
| # covered by .gitignore. Only Docker-specific extras are listed here. | |
| **/target/ | |
| # IMPORTANT: .dockerignore does NOT inherit .gitignore — patterns must be restated. | |
| # Pre-extracted release dirs / archives | |
| apache-hugegraph-*/ | |
| **/*.tar | |
| **/*.tar.gz* | |
| **/*.zip | |
| **/*.war | |
| # IDE / OS | |
| .idea/ | |
| .vscode/ | |
| **/*.iml | |
| **/*.iws | |
| **/.DS_Store | |
| # Build / runtime artifacts | |
| **/logs/ | |
| **/*.log | |
| **/*.class | |
| **/gen-java/ | |
| **/upload-files/ | |
| **/dist/ | |
| **/build/ | |
| **/node_modules/ | |
| # Env files | |
| .env.local | |
| .env.*.local | |
| # Git internals | |
| .git | |
| .gitignore | |
| .gitattributes | |
| .github | |
| # Compose / docs not needed in build context | |
| **/docker-compose*.yml | |
| **/docker-compose*.yaml | |
| **/*.md | |
| docs/ |
| build-server: | ||
| # TODO: we need test & replace it to ubuntu-24.04 or ubuntu-latest | ||
| runs-on: ubuntu-22.04 | ||
| runs-on: ubuntu-latest |
There was a problem hiding this comment.
- Scope creep: bumping
ubuntu-22.04 → ubuntu-latestis unrelated to the Docker cache work in this PR. The original# TODOwas a deliberate guard (Ubuntu 24.04 changed default Java, removed legacy packages). Suggest reverting in this PR and splitting into a focused follow-up. ubuntu-latestdrifts when GitHub flips the alias — pin explicitly:
| runs-on: ubuntu-latest | |
| runs-on: ubuntu-24.04 |
Flagging here since it's the only CI file touched: there is no CI workflow that exercises any Dockerfile (verified — zero docker build references under .github/workflows/). PR's verification is "run docker build locally". Future regressions will ship silently. Consider adding a build-only smoke job:
docker-build:
runs-on: ubuntu-24.04
strategy:
matrix:
dockerfile:
- hugegraph-pd/Dockerfile
- hugegraph-store/Dockerfile
- hugegraph-server/Dockerfile
- hugegraph-server/Dockerfile-hstore
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- run: docker buildx build -f ${{ matrix.dockerfile }} --load .| dumb-init \ | ||
| procps \ | ||
| curl \ | ||
| lsof \ |
There was a problem hiding this comment.
🧹 Pre-existing issue (not a regression from this PR, but worth fixing while in the area): the cron package stays installed in the runtime image (~3 MB), but no cron daemon is ever started — dumb-init only execs docker-entrypoint.sh. This means start-hugegraph.sh -m true silently fails to fire its scheduled monitor job, even though crontab_append returns success. Either:
- Drop
cronentirely (saves ~3 MB and removes the false impression that monitor mode works), or - Start cron in
docker-entrypoint.sh(cron &before the server start) so the feature actually works.
Same applies to Dockerfile-hstore and the PD / Store Dockerfiles.
Purpose of the PR
close #2977
Main Changes
.dockerignore(new file).git,.github, compose files, docs).gitignore, not duplicated per maintainer guidanceAll 4 Dockerfiles
COPY --parents **/pom.xml ./beforeCOPY . .— pom layer only invalidates on pom changes, not source changes--mount=type=cache,target=/root/.m2onmvn package— deps cached across repeated buildsvim,cron,service cron start,rm /var/lib/dpkg/info/libc-bin.*, dead code from runtime stageVerifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - No Need