-
Notifications
You must be signed in to change notification settings - Fork 2
fix(Monitoring): Serve /metrics on a dedicated port #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
a7d4016
Use child_exit hook for Prometheus multiprocess cleanup
emyller a3c1bb1
Serve /metrics on port 9100
emyller 5b99032
Add /burn endpoint for CPU load simulation
emyller da24f69
Delete the temporary /burn endpoint
emyller b1e22c2
Merge remote-tracking branch 'github/main' into fix/metrics-availability
emyller 13aba21
Delete unnecessary test
emyller 9f4c407
Fix coverage
emyller 1ecfc8d
Merge remote-tracking branch 'github/main' into fix/metrics-availability
emyller ebaa535
Explain metrics endpoint availability on port 9100
emyller 85ef049
Less slop
emyller File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| """ | ||
| Standalone Prometheus metrics HTTP server. | ||
|
|
||
| This module provides a separate HTTP server for Prometheus metrics, | ||
| independent of the main Gunicorn application server. This improves | ||
| metrics reliability under high API load. | ||
|
|
||
| The server runs in a daemon thread and serves metrics from the shared | ||
| PROMETHEUS_MULTIPROC_DIR directory. | ||
| """ | ||
|
|
||
| import logging | ||
| import os | ||
| import threading | ||
|
|
||
| from prometheus_client import CollectorRegistry, start_http_server | ||
| from prometheus_client.multiprocess import MultiProcessCollector | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| METRICS_SERVER_PORT = 9100 | ||
|
|
||
| _server_started = False | ||
| _server_lock = threading.Lock() | ||
|
|
||
|
|
||
| def get_multiprocess_registry() -> CollectorRegistry: | ||
| """Create a registry configured for multiprocess metric collection.""" | ||
| registry = CollectorRegistry() | ||
| MultiProcessCollector(registry) # type: ignore[no-untyped-call] | ||
| return registry | ||
|
|
||
|
|
||
| def start_metrics_server( | ||
| port: int = METRICS_SERVER_PORT, | ||
| ) -> None: | ||
| """ | ||
| Start the standalone Prometheus metrics HTTP server. | ||
|
|
||
| This function is idempotent - calling it multiple times will only | ||
| start one server. The server runs in a daemon thread. | ||
|
|
||
| Args: | ||
| port: The port to serve metrics on. Defaults to 9100. | ||
| """ | ||
| global _server_started | ||
|
|
||
| with _server_lock: | ||
| if _server_started: | ||
| logger.debug("Metrics server already started") | ||
| return | ||
|
|
||
| prometheus_multiproc_dir = os.environ.get("PROMETHEUS_MULTIPROC_DIR") | ||
| if not prometheus_multiproc_dir: | ||
| logger.warning("PROMETHEUS_MULTIPROC_DIR not set, skipping metrics server") | ||
| return | ||
|
|
||
| registry = get_multiprocess_registry() | ||
|
|
||
| try: | ||
| start_http_server(port=port, registry=registry) | ||
| _server_started = True | ||
| logger.info("Prometheus metrics server started on port %d", port) | ||
| except OSError as e: | ||
| logger.error("Failed to start metrics server on port %d: %s", port, e) |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| from typing import Generator | ||
|
|
||
| import pytest | ||
|
|
||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def reset_metrics_server_state() -> Generator[None, None, None]: | ||
| """Reset the metrics server global state between tests.""" | ||
| from common.gunicorn import metrics_server | ||
|
|
||
| metrics_server._server_started = False | ||
|
|
||
| yield | ||
|
|
||
| metrics_server._server_started = False |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| import socket | ||
| import urllib.request | ||
|
|
||
| import prometheus_client | ||
| import pytest | ||
| from rest_framework.test import APIClient | ||
|
|
||
| from common.gunicorn.metrics_server import start_metrics_server | ||
| from tests import GetLogsFixture | ||
|
|
||
|
|
||
| @pytest.mark.prometheus_multiprocess_mode | ||
| def test_start_metrics_server__multiprocess_mode__serves_metrics( | ||
| unused_tcp_port: int, | ||
| test_metric: prometheus_client.Counter, | ||
| ) -> None: | ||
| # Given | ||
| test_metric.labels(test_name="standalone_server_test").inc() | ||
|
|
||
| # When | ||
| start_metrics_server(port=unused_tcp_port) | ||
|
|
||
| # Then | ||
| with urllib.request.urlopen( | ||
| f"http://localhost:{unused_tcp_port}/metrics" | ||
| ) as response: | ||
| content = response.read().decode() | ||
|
|
||
| assert response.status == 200 | ||
| assert "pytest_tests_run_total" in content | ||
| assert 'test_name="standalone_server_test"' in content | ||
|
|
||
|
|
||
| # NOTE: This test is temporary. Remove once the Django /metrics endpoint is | ||
| # deprecated in favour of the standalone metrics server. | ||
| @pytest.mark.prometheus_multiprocess_mode | ||
| def test_start_metrics_server__multiprocess_mode__output_matches_django_view( | ||
| unused_tcp_port: int, | ||
| test_metric: prometheus_client.Counter, | ||
| client: APIClient, | ||
| ) -> None: | ||
| # Given | ||
| test_metric.labels(test_name="equivalence_test").inc() | ||
| start_metrics_server(port=unused_tcp_port) | ||
|
|
||
| # When | ||
| with urllib.request.urlopen( | ||
| f"http://localhost:{unused_tcp_port}/metrics" | ||
| ) as response: | ||
| standalone_content = response.read().decode() | ||
|
|
||
| django_response = client.get("/metrics", follow=True) | ||
| django_content = django_response.content.decode() | ||
|
|
||
| # Then | ||
| assert "pytest_tests_run_total" in standalone_content | ||
| assert "pytest_tests_run_total" in django_content | ||
| assert 'test_name="equivalence_test"' in standalone_content | ||
| assert 'test_name="equivalence_test"' in django_content | ||
|
|
||
|
|
||
| def test_start_metrics_server__multiproc_dir_unset__logs_warning_and_skips( | ||
| get_logs: GetLogsFixture, | ||
| ) -> None: | ||
| # Given | ||
| # PROMETHEUS_MULTIPROC_DIR is not set (default state) | ||
|
|
||
| # When | ||
| start_metrics_server() | ||
|
|
||
| # Then | ||
| logs = get_logs("common.gunicorn.metrics_server") | ||
| assert ( | ||
| "WARNING", | ||
| "PROMETHEUS_MULTIPROC_DIR not set, skipping metrics server", | ||
| ) in logs | ||
|
|
||
|
|
||
| @pytest.mark.prometheus_multiprocess_mode | ||
| def test_start_metrics_server__called_multiple_times__remains_idempotent( | ||
| unused_tcp_port: int, | ||
| ) -> None: | ||
| # Given | ||
| start_metrics_server(port=unused_tcp_port) | ||
|
|
||
| # When | ||
| start_metrics_server(port=unused_tcp_port) | ||
| start_metrics_server(port=unused_tcp_port) | ||
|
|
||
| # Then | ||
| with urllib.request.urlopen( | ||
| f"http://localhost:{unused_tcp_port}/metrics" | ||
| ) as response: | ||
| assert response.status == 200 | ||
|
|
||
|
|
||
| @pytest.mark.prometheus_multiprocess_mode | ||
| def test_start_metrics_server__port_unavailable__logs_error( | ||
| unused_tcp_port: int, | ||
| get_logs: GetLogsFixture, | ||
| ) -> None: | ||
| # Given | ||
| # Bind to 0.0.0.0 to match prometheus_client's default address | ||
| blocker = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| blocker.bind(("0.0.0.0", unused_tcp_port)) | ||
| blocker.listen(1) | ||
|
|
||
| try: | ||
| # When | ||
| start_metrics_server(port=unused_tcp_port) | ||
|
|
||
| # Then | ||
| logs = get_logs("common.gunicorn.metrics_server") | ||
| assert any( | ||
| level == "ERROR" and "Failed to start metrics server" in msg | ||
| for level, msg in logs | ||
| ) | ||
| finally: | ||
| blocker.close() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| from unittest.mock import Mock | ||
|
|
||
| from pytest_mock import MockerFixture | ||
|
|
||
| from common.gunicorn.conf import child_exit | ||
|
|
||
|
|
||
| def test_child_exit__calls_mark_process_dead_with_worker_pid( | ||
| mocker: MockerFixture, | ||
| ) -> None: | ||
| # Given | ||
| mark_process_dead_mock = mocker.patch("common.gunicorn.conf.mark_process_dead") | ||
| server = Mock() | ||
| worker = Mock() | ||
| worker.pid = 12345 | ||
|
|
||
| # When | ||
| child_exit(server, worker) | ||
|
|
||
| # Then | ||
| mark_process_dead_mock.assert_called_once_with(12345) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.