Skip to content

Commit 6f319a5

Browse files
author
Vitaly Korolev
committed
Update the build pipeline to handle ARM builds.
Adding new Docker image types: ubi9-arm and ubi9-rootless-arm. Restructure the pipeline to run tests on external ARM agent.
1 parent 575dde7 commit 6f319a5

36 files changed

Lines changed: 2128 additions & 266 deletions

.github/copilot-instructions.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# MarkLogic Docker Build System (Copilot + Contributor Guidance)
2+
3+
This file is optimized for day-to-day contributor workflow and for GitHub Copilot context.
4+
If you are using Copilot/AI to modify this repo, follow the **How Copilot Should Work** rules first.
5+
6+
## How Copilot Should Work
7+
8+
- Prefer the `Makefile` targets over ad-hoc commands (build/test/lint/scan).
9+
- Keep changes minimal and aligned with existing patterns; avoid refactors unless requested.
10+
- Treat these as **sources of truth**:
11+
- Build behavior: `Makefile` and `dockerFiles/*`
12+
- Runtime behavior: `src/scripts/start-marklogic*.sh`
13+
- Test expectations: `test/docker-tests.robot`, `test/structure-test.yaml`, `test/keywords.resource`
14+
- User documentation: `README.md`, `docker-compose/*.yaml`
15+
- When changing behavior (env vars, logs, endpoints, defaults), update the relevant tests and docs in the same PR.
16+
- Do not introduce new build systems, new base images, or extra “nice-to-have” tooling without an explicit request.
17+
- Security rules:
18+
- Never print secrets/credentials to logs.
19+
- Prefer Docker secrets (`/run/secrets/*`) over env vars for credentials.
20+
- Avoid adding packages unless required; vulnerability surface is tightly managed.
21+
22+
## Change Checklist (Common Work)
23+
24+
- **Startup scripts (`src/scripts/*.sh`)**
25+
- Preserve existing log phrasing unless you also update Robot tests that match logs.
26+
- Preserve the root vs rootless behavioral differences (sudo usage, config write mode, converter install).
27+
- **Dockerfile templates (`dockerFiles/*`)**
28+
- Keep the multi-stage + flattened final stage pattern (`COPY --from=builder / /`).
29+
- Keep ownership/permissions correct for rootless (`marklogic_user:users`, UID 1000).
30+
- If you add/remove files, update `test/structure-test.yaml` accordingly.
31+
- **Env vars / secrets**
32+
- Keep naming consistent across `README.md`, `docker-compose/*.yaml`, Dockerfiles, and tests.
33+
- Canonical secret targets: `mldb_admin_username`, `mldb_admin_password`, `mldb_wallet_password`.
34+
- **Tests**
35+
- Add/adjust Robot assertions when behavior or logs change.
36+
- Use the `long_running` tag for slow/integration-heavy tests.
37+
38+
## Local Development Notes
39+
40+
- Use `make lint` to run ShellCheck + Hadolint.
41+
- Use `make test` for `structure-test` + Robot tests.
42+
- This repo builds images for `linux/amd64` by default.
43+
- macOS note: `make structure-test` uses GNU-style `sed -i` (GNU sed syntax); on macOS you may need GNU sed (`gsed`) or run the build/test in a Linux container/VM.
44+
45+
## Project Overview
46+
47+
This repository builds and maintains Docker images for **MarkLogic Server**, a multi-model NoSQL database. The project supports multiple base images (UBI8/UBI9) with both root and rootless variants, includes security hardening via OpenSCAP, and supports FIPS-enabled configurations.
48+
49+
**Key directories:**
50+
- `dockerFiles/` - Dockerfile templates for different image variants
51+
- `src/scripts/` - Container initialization scripts (`start-marklogic.sh`, `start-marklogic-rootless.sh`)
52+
- `test/` - Robot Framework test suite for container validation
53+
- `docker-compose/` - Example cluster configurations
54+
55+
## Build Architecture
56+
57+
### Multi-Stage Build Process
58+
59+
Images are built in two stages to reduce final image size:
60+
1. **Builder stage**: Installs MarkLogic RPM, creates system user, adds TINI init system
61+
2. **Final stage**: Copies from builder, flattens layers, removes unnecessary packages
62+
63+
**Image variants** (controlled by `docker_image_type` parameter):
64+
- `ubi` / `ubi9` - Root images on RedHat Universal Base Image
65+
- `ubi-rootless` / `ubi9-rootless` - Hardened rootless images (user `marklogic_user:1000`)
66+
67+
### Build Commands
68+
69+
Use the `Makefile` for all build operations:
70+
71+
```bash
72+
# Build image (specify RPM package and image type)
73+
make build docker_image_type=ubi9-rootless package=MarkLogic-11.3.nightly-rhel9.x86_64.rpm dockerTag=my-tag
74+
75+
# Run structure tests
76+
make structure-test docker_image_type=ubi9-rootless dockerTag=my-tag
77+
78+
# Run Robot Framework tests
79+
make docker-tests docker_image_type=ubi9-rootless dockerTag=my-tag
80+
81+
# Run specific tests only
82+
make docker-tests DOCKER_TEST_LIST="Smoke Test,Initialized MarkLogic container"
83+
84+
# Security scanning with Grype
85+
make scan docker_image_type=ubi9-rootless dockerTag=my-tag
86+
87+
# SCAP hardening validation
88+
make scap-scan docker_image_type=ubi9-rootless dockerTag=my-tag
89+
90+
# Lint Dockerfiles and shell scripts
91+
make lint
92+
```
93+
94+
**Important:** Rootless images automatically apply OpenSCAP CIS hardening scripts during build. The build downloads `scap-security-guide-${open_scap_version}.zip` and extracts the appropriate remediation script for the OS version.
95+
96+
## Container Initialization Logic
97+
98+
The entrypoint scripts (`start-marklogic.sh` for root, `start-marklogic-rootless.sh` for rootless) handle:
99+
100+
1. **Configuration management**: Writes environment variables to `/etc/marklogic.conf`
101+
2. **Credential extraction**: Reads admin credentials from Docker secrets or env vars
102+
3. **Server initialization**: Calls MarkLogic REST APIs to initialize security database
103+
4. **Cluster joining**: Uses bootstrap host to join existing clusters (HTTP or HTTPS)
104+
5. **Health checks**: Polls `/7997/LATEST/healthcheck` endpoint until ready
105+
106+
### Key Environment Variables
107+
108+
| Variable | Purpose | Notes |
109+
|----------|---------|-------|
110+
| `MARKLOGIC_INIT` | Initialize server with admin credentials | Must be `true` for automated setup |
111+
| `MARKLOGIC_JOIN_CLUSTER` | Join existing cluster via bootstrap host | Requires `MARKLOGIC_BOOTSTRAP_HOST` |
112+
| `MARKLOGIC_BOOTSTRAP_HOST` | Hostname of cluster bootstrap node | Defaults to `bootstrap` |
113+
| `MARKLOGIC_JOIN_TLS_ENABLED` | Use HTTPS for cluster join | Requires `MARKLOGIC_JOIN_CACERT_FILE` secret |
114+
| `OVERWRITE_ML_CONF` | Rewrite `/etc/marklogic.conf` | Always `true` for rootless images |
115+
| `INSTALL_CONVERTERS` | Install MarkLogic Converters package | Uses `/converters.rpm` |
116+
117+
**Secrets precedence**: Docker secrets (files in `/run/secrets/`) are preferred over environment variables for credentials.
118+
119+
## Testing Strategy
120+
121+
### Robot Framework Tests (`test/docker-tests.robot`)
122+
123+
Tests use Robot Framework with Docker and HTTP libraries. Each test case creates containers, validates behavior, and tears down.
124+
125+
**Test execution patterns:**
126+
- Tests tagged `long_running` are excluded by default (use `DOCKER_TEST_LIST` to include)
127+
- All tests create uniquely named containers based on test case name (spaces removed)
128+
- Verification uses Docker logs pattern matching and HTTP endpoint checks
129+
130+
**Common test patterns:**
131+
```robotframework
132+
Create container with -e MARKLOGIC_INIT=true -e MARKLOGIC_ADMIN_USERNAME=admin
133+
Docker log should contain *MARKLOGIC_INIT is true, initializing*
134+
Verify response for authenticated request with 8001 *No license key*
135+
[Teardown] Delete container
136+
```
137+
138+
### Structure Tests
139+
140+
Container Structure Tests validate:
141+
- File existence and permissions
142+
- Metadata labels (version, build branch)
143+
- Command availability
144+
- Environment variables
145+
146+
Template: `test/structure-test.yaml` (placeholders replaced during `make structure-test`)
147+
148+
## Clustering Patterns
149+
150+
**Bootstrap node** initialization:
151+
```yaml
152+
environment:
153+
- MARKLOGIC_INIT=true
154+
- MARKLOGIC_ADMIN_USERNAME_FILE=mldb_admin_username
155+
```
156+
157+
**Additional nodes** joining cluster:
158+
```yaml
159+
environment:
160+
- MARKLOGIC_INIT=true
161+
- MARKLOGIC_ADMIN_USERNAME_FILE=mldb_admin_username
162+
- MARKLOGIC_JOIN_CLUSTER=true
163+
- MARKLOGIC_BOOTSTRAP_HOST=bootstrap_3n
164+
- MARKLOGIC_GROUP=dnode # Optional: join specific group
165+
```
166+
167+
**TLS-enabled joining** (requires CA certificate as secret):
168+
```yaml
169+
environment:
170+
- MARKLOGIC_JOIN_TLS_ENABLED=true
171+
- MARKLOGIC_JOIN_CACERT_FILE=certificate.cer
172+
secrets:
173+
- source: certificate.cer
174+
target: certificate.cer
175+
```
176+
177+
## Critical Implementation Details
178+
179+
### Rootless vs Root Differences
180+
181+
| Aspect | Root Image | Rootless Image |
182+
|--------|-----------|----------------|
183+
| User | `marklogic_user` (UID 1000) | Same |
184+
| PID file | `/var/run/MarkLogic.pid` | `/home/marklogic_user/MarkLogic.pid` |
185+
| Config overwrite | Controlled by `OVERWRITE_ML_CONF` | Always appends to config |
186+
| Hardening | None | OpenSCAP CIS remediation applied |
187+
| Privileges | Requires `sudo` for service start | Uses `start-marklogic.sh` directly |
188+
189+
### Startup Script Retry Logic
190+
191+
- `N_RETRY=15` attempts with `RETRY_INTERVAL=10` seconds for critical operations
192+
- `CURL_TIMEOUT=300` seconds for individual HTTP requests
193+
- **Non-idempotent endpoint**: `/admin/v1/instance-admin` called exactly once (no retries)
194+
- `restart_check()` function polls `/admin/v1/timestamp` to detect server restarts
195+
196+
### Dockerfile Conventions
197+
198+
- Base images always use `ARG BASE_IMAGE` with defaults
199+
- Multi-stage builds flatten layers using `COPY --from=builder / /`
200+
- Security hardening: Removes packages with known vulnerabilities in final stage
201+
- TINI init system (`/tini`) serves as PID 1 to handle zombie processes
202+
- Volume mounted at `/var/opt/MarkLogic` for persistent data
203+
204+
## Common Pitfalls
205+
206+
1. **Rejoining clusters**: Nodes that previously left a cluster may fail to rejoin (known limitation)
207+
2. **Leave button**: Admin UI "leave" may not work; use Management API instead
208+
3. **Timezone**: Containers default to UTC unless `TZ` environment variable is set
209+
4. **HugePages**: Container allocates up to 3/8 of memory limit as HugePages by default (override with `ML_HUGEPAGES_TOTAL`)
210+
5. **Upgrade process**: Must update file ownership to `1000:100` (`marklogic_user:users`) when upgrading to rootless images
211+
212+
## CI/CD Pipeline (Jenkinsfile)
213+
214+
The pipeline supports:
215+
- Pull request validation (draft checks, review state validation)
216+
- Scheduled vulnerability scans (emails to security team)
217+
- Multi-architecture builds (currently `linux/amd64` only)
218+
- Jira ticket extraction from branch names (pattern: `MLE-\d{3,6}`)
219+
- Image publishing to Artifactory and Azure Container Registry
220+
221+
**Pipeline stages:** Checkout → Lint → Build → Structure Test → Docker Tests → Scan → Publish
222+
223+
## Contributing Notes
224+
225+
- PRs are used for inspiration but not merged directly (see `CONTRIBUTING.md`)
226+
- Always create an issue before starting significant work
227+
- Tests must be added/updated for new features
228+
- Linting must pass: `hadolint` for Dockerfiles, `shellcheck` for scripts
229+
- Security scan reports reviewed before merging

.github/workflows/pr-workflow.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
name: 🏷️ JIRA ID Validator
2+
3+
on:
4+
# Using pull_request_target instead of pull_request to handle PRs from forks
5+
pull_request_target:
6+
types: [opened, edited, reopened, synchronize]
7+
# No branch filtering - will run on all PRs
8+
9+
jobs:
10+
jira-pr-check:
11+
name: 🏷️ Validate JIRA ticket ID
12+
# Use the reusable workflow from the central repository
13+
uses: marklogic/pr-workflows/.github/workflows/jira-id-check.yml@main
14+
with:
15+
# Pass the PR title from the event context
16+
pr-title: ${{ github.event.pull_request.title }}

0 commit comments

Comments
 (0)