Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,10 @@ Join our discord community via [this invite link](https://discord.gg/bxgXW8jJGh)
| <a name="input_create_service_linked_role_spot"></a> [create\_service\_linked\_role\_spot](#input\_create\_service\_linked\_role\_spot) | (optional) create the service linked role for spot instances that is required by the scale-up lambda. | `bool` | `false` | no |
| <a name="input_delay_webhook_event"></a> [delay\_webhook\_event](#input\_delay\_webhook\_event) | The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event. | `number` | `30` | no |
| <a name="input_disable_runner_autoupdate"></a> [disable\_runner\_autoupdate](#input\_disable\_runner\_autoupdate) | Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the [GitHub article](https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/) | `bool` | `false` | no |
| <a name="input_ec2_dynamic_labels_policy"></a> [ec2\_dynamic\_labels\_policy](#input\_ec2\_dynamic\_labels\_policy) | Experimental! Can be removed / changed without trigger a major release.<br/>Optional policy for dynamic EC2 override labels evaluated by the webhook<br/>dispatcher. Only effective when `enable_dynamic_labels = true`.<br/><br/>Jobs whose EC2 dynamic labels violate the policy are rejected with a 202 and a<br/>warning is logged.<br/><br/>Evaluation:<br/> 1. Keys in `blocked_keys` are always rejected.<br/> 2. Keys in `restricted_keys` are allowed only when their value passes the rule.<br/> 3. Keys not listed in `blocked_keys` or `restricted_keys` are allowed.<br/><br/>Schema:<br/> - `blocked_keys`: keys to reject outright.<br/> - `restricted_keys`: map of key to value rule:<br/> `{ allowed = [globs], denied = [globs], max = number|string }`.<br/><br/>Keys use the `ghr-ec2-*` dynamic label suffix, not the full label. For example, use<br/>`instance-type` for `ghr-ec2-instance-type`. | `any` | `null` | no |
| <a name="input_enable_ami_housekeeper"></a> [enable\_ami\_housekeeper](#input\_enable\_ami\_housekeeper) | Option to disable the lambda to clean up old AMIs. | `bool` | `false` | no |
| <a name="input_enable_cloudwatch_agent"></a> [enable\_cloudwatch\_agent](#input\_enable\_cloudwatch\_agent) | Enables the cloudwatch agent on the ec2 runner instances. The runner uses a default config that can be overridden via `cloudwatch_config`. | `bool` | `true` | no |
| <a name="input_enable_dynamic_labels"></a> [enable\_dynamic\_labels](#input\_enable\_dynamic\_labels) | Experimental! Can be removed / changed without trigger a major release. Enable dynamic EC2 configs based on workflow job labels. When enabled, jobs can request specific configs via the 'gh-ec2-<config type key>:<config type value>' label (e.g., 'gh-ec2-instance-type:t3.large'). When enabled, labels starting with `ghr-` are ignored during webhook label matching. | `bool` | `false` | no |
| <a name="input_enable_dynamic_labels"></a> [enable\_dynamic\_labels](#input\_enable\_dynamic\_labels) | Experimental! Can be removed / changed without trigger a major release. Enable dynamic EC2 configs based on workflow job labels. When enabled, jobs can request specific configs via the 'ghr-ec2-<config type key>:<config type value>' label (e.g., 'ghr-ec2-instance-type:t3.large'). When enabled, labels starting with `ghr-` are ignored during webhook label matching. | `bool` | `false` | no |
| <a name="input_enable_ephemeral_runners"></a> [enable\_ephemeral\_runners](#input\_enable\_ephemeral\_runners) | Enable ephemeral runners, runners will only be used once. | `bool` | `false` | no |
| <a name="input_enable_jit_config"></a> [enable\_jit\_config](#input\_enable\_jit\_config) | Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is available. In case you are upgrading from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI. | `bool` | `null` | no |
| <a name="input_enable_job_queued_check"></a> [enable\_job\_queued\_check](#input\_enable\_job\_queued\_check) | Only scale if the job event received by the scale up lambda is in the queued state. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | `bool` | `null` | no |
Expand Down
25 changes: 23 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,8 +336,7 @@ Below is an example of the log messages created.

[!WARNING]
**Security implication:** Dynamic labels are extracted from the `runs-on` labels in incoming `workflow_job` webhook events. These labels originate from what
users define in their workflow files. Any user with permission to create or modify workflows can inject arbitrary EC2 configuration values — including instance types, AMI IDs, subnet IDs, EBS volumes, placement settings, and more. **These values are not sanitized or validated** against an allowlist before being passed to the EC2 CreateFleet API. This means a malicious or careless workflow author could, for example:
-
users define in their workflow files. Any user with permission to create or modify workflows can inject arbitrary EC2 configuration values — including instance types, AMI IDs, subnet IDs, EBS volumes, placement settings, and more. Unless constrained with `ec2_dynamic_labels_policy`, these values are not validated against label-specific rules before being passed to the EC2 CreateFleet API. This means a malicious or careless workflow author could, for example:

- Launch expensive instance types (e.g., `p5.48xlarge`) to inflate costs
- Override the AMI (`ghr-ec2-image-id`) to boot a compromised image
Expand Down Expand Up @@ -368,10 +367,32 @@ module "runners" {

...
enable_dynamic_labels = true
ec2_dynamic_labels_policy = {
blocked_keys = ["image-id", "subnet-id"]

restricted_keys = {
"instance-type" = {
allowed = ["m5.*", "c5.*"]
denied = ["m5.metal"]
}

"ebs-volume-size" = {
max = 200
}
}
}
...
}
```

The policy is evaluated by dynamic label key:

1. Keys in `blocked_keys` are always rejected.
2. Keys in `restricted_keys` are allowed only when their value passes the rule.
3. Keys not listed in `blocked_keys` or `restricted_keys` are allowed.

Policy keys use the dynamic label suffix, not the full label. For example, use `instance-type` for `ghr-ec2-instance-type`.

#### Custom identity labels

Any label matching `ghr-<key>:<value>` (where `<key>` does **not** start with `ec2-`) is a custom identity label. These labels have no effect on EC2 instance configuration but are included in the runner matching hash. Use them to:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,6 @@ describe('scaleUp with GHES', () => {
describe('Dynamic EC2 Configuration', () => {
beforeEach(() => {
process.env.ENABLE_ORGANIZATION_RUNNERS = 'true';
process.env.ENABLE_DYNAMIC_LABELS = 'true';
process.env.ENABLE_EPHEMERAL_RUNNERS = 'true';
process.env.ENABLE_JOB_QUEUED_CHECK = 'false';
process.env.RUNNER_LABELS = 'base-label';
Expand Down Expand Up @@ -690,29 +689,6 @@ describe('scaleUp with GHES', () => {
);
});

it('does not process EC2 labels when ENABLE_DYNAMIC_LABELS is disabled', async () => {
process.env.ENABLE_DYNAMIC_LABELS = 'false';

const testDataWithEc2Labels = [
{
...TEST_DATA_SINGLE,
labels: ['ghr-ec2-instance-type:c5.4xlarge'],
messageId: 'test-7',
},
];

await scaleUpModule.scaleUp(testDataWithEc2Labels);

// Should ignore EC2 labels and use default instance types
expect(createRunner).toBeCalledWith(
expect.objectContaining({
ec2instanceCriteria: expect.objectContaining({
instanceTypes: ['t3.medium', 't3.large'],
}),
}),
);
});

it('handles multiple EC2 labels correctly', async () => {
const testDataWithMultipleEc2Labels = [
{
Expand Down
51 changes: 21 additions & 30 deletions lambdas/functions/control-plane/src/scale-runners/scale-up.ts
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,6 @@ export async function scaleUp(payloads: ActionRequestMessageSQS[]): Promise<stri
const instanceTypes = process.env.INSTANCE_TYPES.split(',');
const instanceTargetCapacityType = process.env.INSTANCE_TARGET_CAPACITY_TYPE;
const ephemeralEnabled = yn(process.env.ENABLE_EPHEMERAL_RUNNERS, { default: false });
const dynamicLabelsEnabled = yn(process.env.ENABLE_DYNAMIC_LABELS, { default: false });
const enableJitConfig = yn(process.env.ENABLE_JIT_CONFIG, { default: ephemeralEnabled });
const disableAutoUpdate = yn(process.env.DISABLE_RUNNER_AUTOUPDATE, { default: false });
const launchTemplateName = process.env.LAUNCH_TEMPLATE_NAME;
Expand Down Expand Up @@ -407,13 +406,9 @@ export async function scaleUp(payloads: ActionRequestMessageSQS[]): Promise<stri
: `${payload.repositoryOwner}/${payload.repositoryName}`;

let key = runnerOwner;
if (dynamicLabelsEnabled && labels?.length) {
const dynamicLabels = labels.find((l) => l.startsWith('ghr-'))?.slice('ghr-'.length);

if (dynamicLabels) {
const dynamicLabelsHash = labelsHash(labels);
key = `${key}/${dynamicLabelsHash}`;
}
if (labels?.some((l) => l.startsWith('ghr-'))) {
const dynamicLabelsHash = labelsHash(labels);
key = `${key}/${dynamicLabelsHash}`;
}

let entry = validMessages.get(key);
Expand Down Expand Up @@ -457,27 +452,23 @@ export async function scaleUp(payloads: ActionRequestMessageSQS[]): Promise<stri

let ec2OverrideConfig: Ec2OverrideConfig | undefined = undefined;

if (messages.length > 0 && dynamicLabelsEnabled) {
logger.debug('Dynamic EC2 config enabled, processing labels', { labels: messages[0].labels });

const dynamicEC2Labels = messages[0].labels?.map((l) => l.trim()).filter((l) => l.startsWith('ghr-ec2-')) ?? [];
const allDynamicLabels = messages[0].labels?.map((l) => l.trim()).filter((l) => l.startsWith('ghr-')) ?? [];

if (allDynamicLabels.length > 0) {
runnerLabels = runnerLabels ? `${runnerLabels},${allDynamicLabels.join(',')}` : allDynamicLabels.join(',');

logger.debug('Updated runner labels', { runnerLabels });

if (dynamicEC2Labels.length > 0) {
ec2OverrideConfig = parseEc2OverrideConfig(dynamicEC2Labels);
if (ec2OverrideConfig) {
logger.debug('EC2 override config parsed from labels', {
ec2OverrideConfig,
});
}
const messageLabels = messages.length > 0 ? (messages[0].labels ?? []) : [];
const dynamicEC2Labels = messageLabels.map((l) => l.trim()).filter((l) => l.startsWith('ghr-ec2-'));
const nonEc2DynamicLabels = messageLabels
.map((l) => l.trim())
.filter((l) => l.startsWith('ghr-') && !l.startsWith('ghr-ec2-'));
const allDynamicLabels = [...nonEc2DynamicLabels, ...dynamicEC2Labels];

if (allDynamicLabels.length > 0) {
logger.debug('Dynamic labels present on message', { labels: allDynamicLabels });
runnerLabels = runnerLabels ? `${runnerLabels},${allDynamicLabels.join(',')}` : allDynamicLabels.join(',');
logger.debug('Updated runner labels', { runnerLabels });

if (dynamicEC2Labels.length > 0) {
ec2OverrideConfig = parseEc2OverrideConfig(dynamicEC2Labels);
if (ec2OverrideConfig) {
logger.debug('EC2 override config parsed from labels', { ec2OverrideConfig });
}
} else {
logger.debug('No dynamic labels found on message');
}
}

Expand Down Expand Up @@ -822,8 +813,8 @@ async function createJitConfig(
* - ghr-ec2-accelerator-count-max:<num> - Set maximum accelerator count
* - ghr-ec2-accelerator-manufacturers:<list> - Accelerator manufacturers (comma-separated: nvidia,amd,amazon-web-services,xilinx)
* - ghr-ec2-accelerator-names:<list> - Specific accelerator names (comma-separated)
* - ghr-ec2-accelerator-memory-mib-min:<num> - Min accelerator total memory in MiB
* - ghr-ec2-accelerator-memory-mib-max:<num> - Max accelerator total memory in MiB
* - ghr-ec2-accelerator-total-memory-mib-min:<num> - Min accelerator total memory in MiB
* - ghr-ec2-accelerator-total-memory-mib-max:<num> - Max accelerator total memory in MiB
*
* Instance Requirements (Network & Storage):
* - ghr-ec2-network-interface-count-min:<num> - Min network interfaces
Expand Down
4 changes: 0 additions & 4 deletions lambdas/functions/webhook/src/ConfigLoader.ts
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,9 @@ export class ConfigWebhook extends MatcherAwareConfig {
repositoryAllowList: string[] = [];
webhookSecret: string = '';
workflowJobEventSecondaryQueue: string = '';
enableDynamicLabels: boolean = false;

async loadConfig(): Promise<void> {
this.loadEnvVar(process.env.REPOSITORY_ALLOW_LIST, 'repositoryAllowList', []);
this.loadEnvVar(process.env.ENABLE_DYNAMIC_LABELS, 'enableDynamicLabels', false);

await Promise.all([
this.loadMatcherConfig(process.env.PARAMETER_RUNNER_MATCHER_CONFIG_PATH),
Expand Down Expand Up @@ -164,11 +162,9 @@ export class ConfigWebhookEventBridge extends BaseConfig {
export class ConfigDispatcher extends MatcherAwareConfig {
repositoryAllowList: string[] = [];
workflowJobEventSecondaryQueue: string = ''; // Deprecated
enableDynamicLabels: boolean = false;

async loadConfig(): Promise<void> {
this.loadEnvVar(process.env.REPOSITORY_ALLOW_LIST, 'repositoryAllowList', []);
this.loadEnvVar(process.env.ENABLE_DYNAMIC_LABELS, 'enableDynamicLabels', false);
await this.loadMatcherConfig(process.env.PARAMETER_RUNNER_MATCHER_CONFIG_PATH);

validateRunnerMatcherConfig(this);
Expand Down
1 change: 0 additions & 1 deletion lambdas/functions/webhook/src/modules.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ declare namespace NodeJS {
PARAMETER_GITHUB_APP_WEBHOOK_SECRET: string;
PARAMETER_RUNNER_MATCHER_CONFIG_PATH: string;
REPOSITORY_ALLOW_LIST: string;
ENABLE_DYNAMIC_LABELS: string;
RUNNER_LABELS: string;
ACCEPT_EVENTS: string;
}
Expand Down
Loading