This document outlines the optimization strategies to reduce Docker build times for NumaFlow Python UDFs from 2+ minutes to under 30 seconds for subsequent builds.
- Redundant dependency installation: Each UDF rebuilds the entire pynumaflow package
- No layer caching: Dependencies are reinstalled every time
- Copying entire project: The
COPY ./ ./copies everything, including unnecessary files - No shared base layers: Each UDF builds its own base environment
As suggested by @kohlisid, we implement a three-stage build approach:
- Common Python environment and tools
- System dependencies (curl, wget, build-essential, git)
- Poetry installation
- dumb-init binary
- pynumaflow package installation
- Shared virtual environment creation
- This layer is cached unless
pyproject.tomlorpoetry.lockchanges
- UDF-specific code and dependencies
- Reuses the pynumaflow installation from Stage 2
- Minimal additional dependencies
File: examples/map/even_odd/Dockerfile.optimized
Benefits:
- Better layer caching
- Reduced build time by ~60-70%
- No external dependencies
Usage:
cd examples/map/even_odd
make -f Makefile.optimized imageFiles:
Dockerfile.base(shared base image)examples/map/even_odd/Dockerfile.shared-base(UDF-specific)
Benefits:
- Maximum caching efficiency
- Build time reduced by ~80-90% for subsequent builds
- Perfect for CI/CD pipelines
Usage:
# Build base image once
docker build -f Dockerfile.base -t numaflow-python-base .
# Build UDF images (very fast)
cd examples/map/even_odd
make -f Makefile.optimized image-fast| Approach | First Build | Subsequent Builds | Cache Efficiency |
|---|---|---|---|
| Current | ~2-3 minutes | ~2-3 minutes | Poor |
| Optimized Multi-Stage | ~2-3 minutes | ~45-60 seconds | Good |
| Shared Base Image | ~2-3 minutes | ~15-30 seconds | Excellent |
# From project root
docker build -f Dockerfile.base -t numaflow-python-base .Replace the current Dockerfile with the optimized version:
# For each UDF directory
cp Dockerfile.optimized Dockerfile
# or
cp Dockerfile.shared-base DockerfileUse the optimized Makefile:
# For each UDF directory
cp Makefile.optimized MakefileFor CI/CD pipelines, add the base image build step:
# Example GitHub Actions step
- name: Build base image
run: docker build -f Dockerfile.base -t numaflow-python-base .
- name: Build UDF images
run: |
cd examples/map/even_odd
make image-fastThe optimized Dockerfiles implement smart dependency caching:
pyproject.tomlandpoetry.lockare copied first- pynumaflow installation is cached separately
- UDF-specific dependencies are installed last
- Minimal system dependencies in runtime image
- Separate build and runtime stages
- Efficient file copying with specific paths
- Copy only necessary files
- Use
.dockerignoreto exclude unnecessary files - Minimize build context size
-
Backup current Dockerfile:
cp Dockerfile Dockerfile.backup
-
Choose optimization approach:
- For single UDF: Use
Dockerfile.optimized - For multiple UDFs: Use
Dockerfile.shared-base
- For single UDF: Use
-
Update Makefile:
cp Makefile.optimized Makefile
-
Test the build:
make image # or make image-fast
-
Use the optimized template:
cp examples/map/even_odd/Dockerfile.optimized your-udf/Dockerfile cp examples/map/even_odd/Makefile.optimized your-udf/Makefile
-
Update paths in Dockerfile:
- Change
EXAMPLE_PATHto your UDF path - Update
COPYcommands accordingly
- Change
-
Base image not found:
docker build -f Dockerfile.base -t numaflow-python-base . -
Permission issues:
chmod +x entry.sh
-
Poetry cache issues:
poetry cache clear --all pypi
Monitor build times:
time make image
time make image-fast- Registry-based base images: Push base image to registry for team sharing
- BuildKit optimizations: Enable BuildKit for parallel layer building
- Multi-platform builds: Optimize for ARM64 and AMD64
- Dependency analysis: Automate dependency optimization
When adding new UDFs or modifying existing ones:
- Use the optimized Dockerfile templates
- Follow the three-stage approach
- Test build times before and after changes
- Update this documentation if needed