fix(infra): Task processor tasks killed during startup by health check by khvn26 · Pull Request #7887 · Flagsmith/flagsmith

khvn26 · 2026-06-26T12:01:23Z

Thanks for submitting a PR! Please check the boxes below:

I have read the Contributing Guide.
I have added information to docs/ if required so people know about the feature.
I have filled in the "Changes" section below.
I have filled in the "How did you test this code" section below.

Changes

Raises the task-processor container health check startPeriod from 5s to 120s, in both the production and staging ECS task definitions.

With startPeriod: 5 along with interval 10 / retries 5, ECS only tolerated ~55s before marking the container unhealthy, often killing tasks just as they were about to come up. This has been recycling task-processor ECS tasks several times a day, and makes deploys flap.

startPeriod: 120 covers the observed cold start and its variance, while interval 10 / timeout 2 / retries 5 still flags a genuinely hung task ~50s after the grace window closes.

This is a temporary change until we figure out boot time improvements.

How did you test this code?

N/A

The task-processor health check (`flagsmith healthcheck tcp`) passes only once the container binds port 8000, but the entrypoint runs full bootstrap (migrations, createcachetable, ClickHouse migrate, waitfordb) first, taking ~80s to bind. With startPeriod 5s the container was marked unhealthy and SIGTERMed after ~55s — often just before it came up — recycling tasks several times a day and flapping deploys. Raise startPeriod to 120s in the prod and staging ECS task definitions to cover the observed cold-start and its variance. beep boop

vercel · 2026-06-26T12:01:25Z

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments

Project	Deployment	Updated (UTC)
docs	Ignored	Jun 26, 2026 12:01pm
flagsmith-frontend-preview	Ignored	Jun 26, 2026 12:01pm
flagsmith-frontend-staging	Ignored	Jun 26, 2026 12:01pm

germangarces

LGTM

khvn26 requested a review from a team as a code owner June 26, 2026 12:01

khvn26 requested review from germangarces and removed request for a team June 26, 2026 12:01

flagsmith-engineering Bot assigned germangarces Jun 26, 2026

github-actions Bot added the infrastructure label Jun 26, 2026

germangarces approved these changes Jun 26, 2026

View reviewed changes

This was referenced Jun 26, 2026

Proxy run-docker.sh to flagsmith entrypoint #7891

Open

Replace Core's run-docker.sh Flagsmith/flagsmith-common#239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(infra): Task processor tasks killed during startup by health check#7887

fix(infra): Task processor tasks killed during startup by health check#7887
khvn26 wants to merge 1 commit into
mainfrom
fix/task-processor-healthcheck-start-period

khvn26 commented Jun 26, 2026

Uh oh!

vercel Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

germangarces left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

khvn26 commented Jun 26, 2026

Changes

How did you test this code?

Uh oh!

vercel Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

germangarces left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 26, 2026 •

edited

Loading