Management_Benchmark#6690
Conversation
…neja/PerfKitBenchmarker into Management_Plane_Ashish
- Rename k8s_management_benchmark.py to kubernetes_management_benchmark.py - elastic_kubernetes_service.py: add capacityReservationSpecification with preference 'open' to target EC2 capacity reservations in CreateNodePoolAsync — ensures reserved t3.medium instances are used by EKS nodegroups instead of competing for on-demand capacity
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
| sorted_lats = sorted(latencies) | ||
| meta = {'sample_count': str(n)} | ||
|
|
||
| def _Percentile(p): |
There was a problem hiding this comment.
can we use statistics.quantiles() instead of using custom percentile implementation here?
There was a problem hiding this comment.
Hi @mahesh8842 , Thank you for pointing this. Updated code with your suggestion.
|
|
||
| def __init__(self): | ||
| self._lock = threading.Lock() | ||
| self.entries: list[tuple[str, float, float, Exception | None]] = [] |
There was a problem hiding this comment.
use dataclass or named_tuple to improve readability
this allows access to fields by name instead of index
@dataclasses.dataclass
class entry:
name: str
init_dur: float
e2e_dur: float
error: Exception | None = None
There was a problem hiding this comment.
| # /19 is narrowest CIDR range GKE supports | ||
| return min(cidr_size, 19) | ||
| # /17 is narrowest CIDR range GKE supports | ||
| return min(cidr_size, 16) |
There was a problem hiding this comment.
according to line 58, 17 is the narrowest supported range
but in line 59, 16 is mentioned.
Please check and keep the comment and implementation in sync
There was a problem hiding this comment.
@mahesh8842 Updated the value in comment
| try: | ||
| yield importlib.import_module(modname) | ||
| except Exception as e: # pylint: disable=broad-except | ||
| logging.warning('Skipping module %s due to import error: %s', modname, e) |
There was a problem hiding this comment.
Please prefer logging.exception instead of logging.warning
logging.exception would capture the full stack trace
There was a problem hiding this comment.
Hi @mahesh8842 , Please ignore , reverted the changes
| e2e_lat = total wall time including wait. On kickoff failure both are set | ||
| to elapsed time at failure point. | ||
| """ | ||
| init_start = time.time() |
There was a problem hiding this comment.
prefer time.monotonic() instead of time.time()
at line 613
init_dur = time.monotonic() - init_start
There was a problem hiding this comment.
| def _OpSamples( | ||
| metric_prefix: str, | ||
| results: list[tuple[str, float, float, Exception | None]], | ||
| attempted_ops: int = None, |
There was a problem hiding this comment.
| def add(self, name: str, init_dur: float, e2e_dur: float, | ||
| err: Exception | None) -> None: | ||
| with self._lock: | ||
| self.entries.append((name, init_dur, e2e_dur, err)) |
There was a problem hiding this comment.
You can use the same dataclass here
result = entry(name,init_dur,e2e_dur,err)
with self._lock:
self.entries.append(result)
There was a problem hiding this comment.
Hi @mahesh8842 , Updated the changes.
| ) | ||
| if rc: | ||
| logging.warning( | ||
| 'Sleep workload deploy returned rc=%d (non-fatal; continuing)', rc) |
There was a problem hiding this comment.
Generally we should just fail rather than silently continue & potentially have bad results. There are perhaps some errors/warning which are truly ignorable, but that should be the minority. If there are some that are worth ignoring, I'd also expect those to have something like "out, err, code = IssueCommand..
if code:
if 'expected but ignorable warning message' in err:
log "this is fine"
return
else:
raise
There was a problem hiding this comment.
Hi @hubatish , Updated code to fail instead of warning
There was a problem hiding this comment.
Great. Expand to pretty much all uses of logging.warning in providers as well.
| self.event_poller.StartPolling() | ||
| try: | ||
| self.event_poller.StartPolling() | ||
| except Exception as exc: # pylint: disable=broad-except |
There was a problem hiding this comment.
Are you encountering this error when using eg run_stage=provision,prepare --run_uri=foo & then later --run_stage=run --run_uri=foo on the same cluster, or just every time when running any benchmark in py 3.14? Either way please separate into a different PR & raise the issue in chat as we will likely need a good solution here.
There was a problem hiding this comment.
Hi @hubatish , Reverted the changes. We will review again and raise in separate PR
| """Initiates node-pool delete; returns opaque op handle. Does NOT wait.""" | ||
| raise NotImplementedError | ||
|
|
||
| def UpdateClusterAsync(self) -> str: |
There was a problem hiding this comment.
Why do we have sync vs async variants? Is this due to cloud variance? We shouldn't have both - stick to one pattern or the other. If eg all clouds don't support async (you were talking about a GKE bug here), then perhaps just go with everything using a sync variant.
There was a problem hiding this comment.
Hi @hubatish , The sync/async split is intentional for concurrency and overlapping scenarios. the benchmark's core scenarios require concurrency (Scenario A fires N creates simultaneously, Scenario B overlaps a create with a cluster update), which isn't possible with blocking calls. The sync variants (CreateNodePool, DeleteNodePool) are kept only for non-timed setup/teardown paths (Prepare, _CleanStartSweep) where blocking is correct.
And UpdateCluster (sync) seems is unused.
There was a problem hiding this comment.
Ok - sort of makes sense. However, I took a look at the provider specific implementations & had a hard time telling what exactly was different. What is different about them currently?
/ actual comment/change: the sync & async versions should share most of their code. A simple way to do this might be to always call the async version, but then wait for the result in the sync version. With a nice, sufficiently abstract presentation, I bet you could do this waiting for the async version from the sync in the parent class. Cleanly that might look like:
def UpdateClusterSync(self):
op = self.UpdateClusterAsync()
self.WaitForOperation(op)
Also consider moving _RunAsync & _TimedAsync to kubernetes_cluster (unsure of this one).
| ): | ||
| if not is_pkg: | ||
| yield importlib.import_module(modname) | ||
| try: |
There was a problem hiding this comment.
yeah you just shouldn't be causing errors in nor editing this file. ie I'm not sure why you need this but you should fix in some other fashion.
There was a problem hiding this comment.
Hi @hubatish , Reverted the changes.
| """Adds an additional nodepool with the given name to the cluster.""" | ||
| pass | ||
|
|
||
| def CreateNodePool( |
There was a problem hiding this comment.
We already have AddNodepool which is used by kubernetes nodepool provisioning. Why do we need this one too? This quite possibly might be the better implementation, but please refactor the old one to use the new one as well. I can send a sample provision node pools benchmark run command for you to test with.
There was a problem hiding this comment.
Hi @hubatish, AddNodepool delegate to CreateNodePool so CreateNodePool will be single source of truth for standard clustersc. Please let me know if this is expected.
There was a problem hiding this comment.
There are client specific implementations for Azure / EKS. Please remove those in favor of just callin ghte CreateNodepool implementation as done here & give it a test. + maybe update the provision node pool benchmark. It looks like the naming wrapper here should work, but tbh that benchmark is the only caller & using it should work. Also the "BaseNodePoolConfig" used here with only a name is certainly wrong - again look at the code in provision_node_pools_benchmark.py & update it to work with this logic instead.
Here are some sample args to try with:
--cloud=AWS --benchmarks=provision_node_pools --config_override='provision_node_pools.container_cluster.type='"'"'Karpenter'"'"'' --config_override='provision_node_pools.container_cluster.vm_spec.AWS.machine_type='"'"'m6i.xlarge'"'"'' --config_override='provision_node_pools.container_cluster.nodepools.fibpool.vm_spec.AWS.machine_type='"'"'m6i.xlarge'"'"'' --metadata=cloud:AWS --provision_node_pools_init_batch=1 --provision_node_pools_test_batch=2 --zone=us-east-1a --timeout_minutes=236
--cloud=Azure --benchmarks=provision_node_pools --config_override='provision_node_pools.container_cluster.type='"'"'Kubernetes'"'"'' --config_override='provision_node_pools.container_cluster.vm_spec.Azure.machine_type='"'"'Standard_D4s_v3'"'"'' --config_override='provision_node_pools.container_cluster.nodepools.fibpool.vm_spec.Azure.machine_type='"'"'Standard_D4s_v3'"'"'' --metadata=cloud:Azure --provision_node_pools_init_batch=1 --provision_node_pools_test_batch=2 --zone=eastus2-1 --timeout_minutes=236
--cloud=GCP --benchmarks=provision_node_pools --config_override='provision_node_pools.container_cluster.type='"'"'Kubernetes'"'"'' --config_override='provision_node_pools.container_cluster.vm_spec.GCP.machine_type='"'"'c4-standard-4'"'"'' --config_override='provision_node_pools.container_cluster.nodepools.fibpool.vm_spec.GCP.machine_type='"'"'c4-standard-4'"'"'' --metadata=cloud:GCP --project=p3rf-gke --provision_node_pools_init_batch=1 --provision_node_pools_test_batch=2 --zone=europe-west4-a --timeout_minutes=236
bc904ab to
8338677
Compare
| ], | ||
| raise_on_failure=False, | ||
| ) | ||
| self.cluster_version = ver_out.strip() if ver_rc == 0 and ver_out.strip() else '1.34' |
There was a problem hiding this comment.
Hardcoding 1.34 will quickly age out once 1.34 is EOL. Maybe throw an exception instead of silently setting 1.34?
There was a problem hiding this comment.
fixed. Now raises a CreationError if the cluster version can't be determined from describe-cluster instead of silently falling back to '1.34'
| # Reserve enough capacity per AZ for 100 pools: | ||
| # ~67 pools per AZ × 2 nodes = 134 instances max per AZ (Scenario A) | ||
| # Plus default nodegroup (2) + buffer = 80 minimum for 10 pools, 150 for 100 pools | ||
| concurrent = getattr(FLAGS, 'k8s_mgmt_concurrent_nodepools', 10) |
There was a problem hiding this comment.
This is in EksCluster._Create, which is the default EKS class. Every benchmark that runs on AWS with a Kubernetes cluster goes through here, not just k8s_management. So all the new behavior (capacity reservations per AZ, launch templates, the forced 3-AZ subnet layout, and the default nodegroup getting pinned to control_plane_zones[0] over in _RenderNodeGroupJson) gets applied to every EKS run.
A few things that make this concretely bad for unrelated benchmarks:
The reservations are hardcoded to t3.medium. Something like kubernetes_nginx uses m6i.xlarge, so its nodegroups won't consume the reservation at all, and the user just gets billed for reserved capacity they can't use, plus the on-demand they actually wanted. Same story for most of the other AWS k8s benchmarks (kubernetes_redis_memtier, kubernetes_mongodb_ycsb, container_netperf, etc.). I count ~14 of them that hit this path.
The getattr(FLAGS, 'k8s_mgmt_concurrent_nodepools', 10) on this line is a tell that this code knows it's in the wrong place. That flag only exists when your benchmark module is loaded; for any other benchmark you silently fall back to 10 and still create 80 reservations × 3 AZs per run.
Pinning the default nodegroup to a single AZ is also a quiet behavior change. Anything that relied on multi-AZ default placement (HA, anti-affinity, latency tests) loses it without warning.
Could you gate this? Either an eks_reserve_capacity_per_az flag in providers/aws/flags.py defaulted off, or move the reservation/launch-template setup into your benchmark's Prepare() and expose a small helper on EksCluster for it. The bar I'd want to hit before this lands: running provision_node_pools or kubernetes_nginx on EKS produces the same AWS resources as on master, with no extra reservations, no extra launch templates, and default nodegroup placement unchanged.
There was a problem hiding this comment.
I've addressed this by:
-
By Adding a new --eks_reserve_capacity_per_az flag in providers/aws/flags.py (default: False)
-
Gating all capacity reservation creation/cleanup and launch template lookup in _Create, _Delete, CreateNodePoolAsync, and UpgradeNodePoolAsync behind this flag.
-
When the flag is False (default), the code path is identical to master — no reservations created, no launch templates, no extra API calls, default nodegroup placement unchanged
The kubernetes_management benchmark explicitly passes --eks_reserve_capacity_per_az=true to opt in
e1f248c to
d993325
Compare
…Management_Plane_Combined
…suneja/PerfKitBenchmarker into Management_Plane_Combined
79fc26d to
73ec550
Compare
…suneja/PerfKitBenchmarker into Management_Plane_Combined
| """Adds an additional nodepool with the given name to the cluster.""" | ||
| pass | ||
|
|
||
| def CreateNodePool( |
There was a problem hiding this comment.
There are client specific implementations for Azure / EKS. Please remove those in favor of just callin ghte CreateNodepool implementation as done here & give it a test. + maybe update the provision node pool benchmark. It looks like the naming wrapper here should work, but tbh that benchmark is the only caller & using it should work. Also the "BaseNodePoolConfig" used here with only a name is certainly wrong - again look at the code in provision_node_pools_benchmark.py & update it to work with this logic instead.
Here are some sample args to try with:
--cloud=AWS --benchmarks=provision_node_pools --config_override='provision_node_pools.container_cluster.type='"'"'Karpenter'"'"'' --config_override='provision_node_pools.container_cluster.vm_spec.AWS.machine_type='"'"'m6i.xlarge'"'"'' --config_override='provision_node_pools.container_cluster.nodepools.fibpool.vm_spec.AWS.machine_type='"'"'m6i.xlarge'"'"'' --metadata=cloud:AWS --provision_node_pools_init_batch=1 --provision_node_pools_test_batch=2 --zone=us-east-1a --timeout_minutes=236
--cloud=Azure --benchmarks=provision_node_pools --config_override='provision_node_pools.container_cluster.type='"'"'Kubernetes'"'"'' --config_override='provision_node_pools.container_cluster.vm_spec.Azure.machine_type='"'"'Standard_D4s_v3'"'"'' --config_override='provision_node_pools.container_cluster.nodepools.fibpool.vm_spec.Azure.machine_type='"'"'Standard_D4s_v3'"'"'' --metadata=cloud:Azure --provision_node_pools_init_batch=1 --provision_node_pools_test_batch=2 --zone=eastus2-1 --timeout_minutes=236
--cloud=GCP --benchmarks=provision_node_pools --config_override='provision_node_pools.container_cluster.type='"'"'Kubernetes'"'"'' --config_override='provision_node_pools.container_cluster.vm_spec.GCP.machine_type='"'"'c4-standard-4'"'"'' --config_override='provision_node_pools.container_cluster.nodepools.fibpool.vm_spec.GCP.machine_type='"'"'c4-standard-4'"'"'' --metadata=cloud:GCP --project=p3rf-gke --provision_node_pools_init_batch=1 --provision_node_pools_test_batch=2 --zone=europe-west4-a --timeout_minutes=236
| ) | ||
| _SCENARIOS = flags.DEFINE_list( | ||
| 'k8s_mgmt_scenarios', | ||
| ['A', 'B', 'C'], |
There was a problem hiding this comment.
these should have helpful names & probably use define enum
| result['gce_local_ssd_count'] = self.default_nodepool.max_local_disks | ||
| result['gce_local_ssd_interface'] = self.default_nodepool.ssd_interface | ||
| result['gke_nccl_fast_socket'] = self.enable_nccl_fast_socket | ||
| if 'nccl' in self.nodepools: |
There was a problem hiding this comment.
I think these got synced out (sync to head & they're no longer there).
| ) | ||
| if rc: | ||
| logging.warning( | ||
| 'Sleep workload deploy returned rc=%d (non-fatal; continuing)', rc) |
There was a problem hiding this comment.
Great. Expand to pretty much all uses of logging.warning in providers as well.
| # The kubernetes_management benchmark does not use persistent volumes, so | ||
| # EBS CSI setup (OIDC + IAM role + addon install) is unnecessary and adds | ||
| # ~3 minutes to every run. Set to True to skip it and save time. | ||
| # Defined before FLAGS = flags.FLAGS so it is registered at import time |
There was a problem hiding this comment.
this sounds overly complicated & weird. your code should not care about import order etc.
| concurrent = getattr(FLAGS, 'k8s_mgmt_concurrent_nodepools', 10) | ||
| nodes_per_az = max(80, concurrent * 2 + 20) | ||
| # Fetch cluster CA and endpoint for bootstrap user data | ||
| import json as _json |
There was a problem hiding this comment.
all imports should go at top of files
| """Initiates node-pool delete; returns opaque op handle. Does NOT wait.""" | ||
| raise NotImplementedError | ||
|
|
||
| def UpdateClusterAsync(self) -> str: |
There was a problem hiding this comment.
Ok - sort of makes sense. However, I took a look at the provider specific implementations & had a hard time telling what exactly was different. What is different about them currently?
/ actual comment/change: the sync & async versions should share most of their code. A simple way to do this might be to always call the async version, but then wait for the result in the sync version. With a nice, sufficiently abstract presentation, I bet you could do this waiting for the async version from the sync in the parent class. Cleanly that might look like:
def UpdateClusterSync(self):
op = self.UpdateClusterAsync()
self.WaitForOperation(op)
Also consider moving _RunAsync & _TimedAsync to kubernetes_cluster (unsure of this one).
Implements the GKE Management Plane Operations benchmark across EKS, GKE, and AKS. Three scenarios are covered: Scenario A — 100 pools created, upgraded (N-1→N minor version), and deleted concurrently across 3 AZs; Scenario B — a NodePool Create fired 3 seconds into an ongoing ClusterUpdate to measure control-plane serialization behavior; Scenario C — 100 pools streamed continuously (max 50 at a time) to measure large-scale provisioning throughput.