Refactor GKE to Support 3-tier Nginx architecture with Upstream nodepool#6458
Refactor GKE to Support 3-tier Nginx architecture with Upstream nodepool#6458jimmycgz wants to merge 3 commits intoGoogleCloudPlatform:masterfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
7773c41 to
30b076e
Compare
30b076e to
36298bc
Compare
|
Removed the tmp helper container_service.py |
…mmands - Refactored benchmark to use Client -> Proxy -> Upstream architecture - Migrated from generic vm_util to kubernetes_commands for native K8s support - Added nginx_proxy.yaml.j2 and nginx_upstream.yaml.j2 templates - Fixed connectivity check to target /random_content (avoids 403 Forbidden) - Added new flags for upstream machine type and count configuration
36298bc to
efa2978
Compare
| (Client -> Nginx Proxy -> Upstream Backend) on Kubernetes. | ||
| container_specs: | ||
| kubernetes_nginx: | ||
| image: k8s_nginx |
There was a problem hiding this comment.
Can you update to use the official nginx image instead of the ubuntu-based one that's currently defined? Please pin it to a specific version (current latest is fine).
There was a problem hiding this comment.
Please remove this clients section since it's unused.
There was a problem hiding this comment.
This file should probably be deleted?
| """Prepares a cluster to run the Nginx benchmark.""" | ||
| """Prepares the GKE cluster with proxy and upstream nginx deployments.""" | ||
| # 1. Create ConfigMap with merged proxy + upstream configs | ||
| with _CreateNginxConfigMapDir() as nginx_config_map_dirname: |
There was a problem hiding this comment.
Could you please log the contents of the computed ConfigMap? Are the contents part of the final artifact stored after the test runs? (maybe data.ResourcePath(...) is automatically kept after the test runs)
| ) | ||
|
|
||
| BENCHMARK_NAME = 'kubernetes_nginx' | ||
| NGINX_IMAGE = 'nginx:1.29.6' |
There was a problem hiding this comment.
Is this const needed here or should it be in container_specs.kubernetes_nginx.image?
| machine_type: Standard_D4s_v5 | ||
| clients: | ||
| vm_count: 1 | ||
| upstream: |
There was a problem hiding this comment.
Looking at the GCE test I see 3 groups: clients, server, upstream_servers.
In this setup I see nginx (I guess it's the server) and upstream (I guess upstream_servers) and the vm_groups.clients for the client. I believe that in this setup the client is running outside the GKE clusters, cc @hankfreund to validate this setup.
There was a problem hiding this comment.
Yeah, your understanding is correct. And we decided to keep the client VMs outside of the cluster, instead of having the traffic entirely within it.
| """Run a benchmark against the Nginx server.""" | ||
| return nginx_benchmark.Run(benchmark_spec) | ||
| if FLAGS.nginx_throttle: | ||
| return nginx_benchmark._RunMultiClient( |
There was a problem hiding this comment.
Great work, good to see that we're reusing functions from https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/908706425bef535da7a3ff1708701ab56d6c6952/perfkitbenchmarker/linux_benchmarks/nginx_benchmark.py
| hostip = benchmark_spec.nginx_endpoint_ip | ||
| hoststr = ( | ||
| f'[{hostip}]' | ||
| if isinstance(ipaddress.ip_address(hostip), ipaddress.IPv6Address) |
There was a problem hiding this comment.
nit; I don't remember the status of ipv6 support on competitor clouds.
| nodepools: | ||
| nginx: | ||
| vm_count: 3 | ||
| vm_count: 1 |
There was a problem hiding this comment.
nit: I don't have the context on why this was 3 in the first place. 1 sounds more correct.
| clients: | ||
| vm_count: 1 | ||
| upstream: | ||
| vm_count: 2 |
There was a problem hiding this comment.
nit: In the original nginx benchmark, we had 6 upstream servers. Is 2 sufficient to load the nginx servers?
| except errors.VirtualMachine.RemoteCommandError: | ||
| pass | ||
| logging.info('Still waiting for connectivity...') | ||
| time.sleep(10) |
There was a problem hiding this comment.
instead of sleep, prefer retrying on failures.
Description
This PR refactors the
kubernetes_nginxbenchmark to implement a true 3-tier architecture (Client -> Nginx Proxy -> Upstream Backend), matching the standard GCEnginx_benchmarktopology. It replaces genericvm_utilcalls with robustkubernetes_commandsfor better resource management and reliability.Key Changes
wrk2load generator (external VM or pod).kubernetes_commandsfor applying manifests, creating ConfigMaps (nginx-configs), and waiting for resources.nginx_proxy.yaml.j2andnginx_upstream.yaml.j2templates for flexible configuration./random_contentinstead of/(resolving 403 errors).Supported Platforms & Limitations
GetLoadBalancerIP(expects IP, AWS returns Hostname). Future work required.Test Commands
Example 1: HTTP Baseline (Default)
./pkb.py --benchmarks=kubernetes_nginx --cloud=GCP --zone=us-central1-a \ --project=$PROJECT_ID --run_uri=test_http \ --nginx_server_machine_type=c4-standard-8 \ --config_override=kubernetes_nginx.vm_groups.clients.vm_spec.GCP.machine_type=c4-standard-32 \ --config_override=kubernetes_nginx.container_cluster.nodepools.upstream.vm_spec.GCP.machine_type=c4-standard-16 \ --config_override=kubernetes_nginx.container_cluster.nodepools.upstream.vm_count=2 \ --nginx_content_size=1024 \ --nginx_use_ssl=FalseExample 2: HTTPS Validation
./pkb.py --benchmarks=kubernetes_nginx --cloud=GCP --zone=us-central1-a \ --project=$PROJECT_ID --run_uri=test_https \ --nginx_server_machine_type=c4-standard-8 \ --config_override=kubernetes_nginx.vm_groups.clients.vm_spec.GCP.machine_type=c4-standard-32 \ --config_override=kubernetes_nginx.container_cluster.nodepools.upstream.vm_spec.GCP.machine_type=c4-standard-16 \ --config_override=kubernetes_nginx.container_cluster.nodepools.upstream.vm_count=2 \ --nginx_content_size=1024 \ --nginx_use_ssl=True