Add standalone metrics service to Payjoin-service#1296
Add standalone metrics service to Payjoin-service#1296spacebear21 merged 1 commit intopayjoin:masterfrom
Conversation
Pull Request Test Coverage Report for Build 21691748627Details
💛 - Coveralls |
52a17f3 to
192c5b5
Compare
spacebear21
left a comment
There was a problem hiding this comment.
Thanks for the quick PR! Besides the review comments below I have two other requests:
- Let's squash the two commits so that the first one doesn't fail CI
- IMO there are two additional metrics described in #1284 that would be really valuable to add here. Both of these are give us a much better overview of actual usage:
- number of entries in DB (v1 vs v2)
- unique shortid counters for a given time period. Extra points for the ability to select a time frame.
| .layer(axum::middleware::from_fn_with_state(metrics.clone(), track_metrics)) | ||
| .layer(axum::middleware::from_fn_with_state(metrics.clone(), track_connections)) |
There was a problem hiding this comment.
axum docs recommend using ServiceBuilder to apply multiple middleware at once: https://docs.rs/axum/latest/axum/middleware/index.html#applying-multiple-middleware
| let status_type = match status_code { | ||
| 200..=299 => "success", | ||
| 300..=399 => "redirect", | ||
| 400..=499 => "client_error", | ||
| 500..=599 => "server_error", | ||
| _ => "unknown", | ||
| }; |
There was a problem hiding this comment.
This is lossy, we might as well save the exact status code and grouping could always be done when "encoding" at read time.
|
|
||
| // Total number of connections seen so far | ||
| let total_connections = | ||
| IntCounter::new("Total_connections", "Total number of connections")?; |
There was a problem hiding this comment.
All tracked metrics should be defined in constants instead of hardcoded. Also this one is capitalized, they should be consistently lower snake case. In general we should do our best to follow Prometheus naming best practices
| Ok(body) => { | ||
| let response = Response::builder() | ||
| .status(StatusCode::OK) | ||
| .header("Content-Type", "text/plain; version=0.0.4; charset=utf-8") |
There was a problem hiding this comment.
| match (method, path) { | ||
| // Ohttp endpoints | ||
| ("POST", "/.well-known/ohttp-gateway") => "ohttp_gateway", | ||
| ("GET", "/.well-known/ohttp-gateway") => "ohttp_gateway_get", | ||
| ("POST", "/") => "ohttp_gateway", | ||
| ("GET", "/ohttp-keys") => "ohttp_keys", | ||
|
|
||
| ("GET", "/health") => "health", | ||
| ("GET", "/") => "directory_home", | ||
| ("GET", "/metrics") => "metrics", | ||
|
|
||
| _ => "unknown", | ||
| } | ||
| .to_string() |
There was a problem hiding this comment.
In the same vein as https://github.com/payjoin/rust-payjoin/pull/1296/changes#r2724621450, we could store the whole path and method for losslessness, and group or sort things when reading instead of when writing.
This also makes me wonder if there's a way to correlate different metrics? For example say we are tracking lots of 400s and they are coming from "GET /ohttp-keys" requests, is there a way to identify that this is the problematic path? How might we achieve that?
There was a problem hiding this comment.
since there's only a small number of paths each endpoint could be counted separately
time series databases should generally be able to aggregate such information but for convenience we could also have combined counters/histograms
that said, for the specific example you gave is more of a logging/observability concern than a metrics one, i.e. knowing that there are errors to look into makes sense as a metric but if you actually want to figure out the cause you probably want more than just the endpoint but the complete error
the tracing crate can emit structured logs with spans intact using opentracing, it shouldn't take much work to make that available... then with various log searching tools you can see aggregate error counts and break them down by the other request properties
nothingmuch
left a comment
There was a problem hiding this comment.
i think it would be better to only serve metrics on a separate interface until we've determined whether there's a good reason to serve specific ones publicly
secondly, the manual routing logic and tower service impl could probably be replaced with axum's router and response handling, wouldn't that be simpler? if that's already planned for a followup PR i think i would prefer if it was done here since this introduces some new code to translate from axum's concepts to the lower level code
|
Thanks for the review @spacebear21 , i'll implement the changes
my intended implementation is create a metrics reporter trait in the directory and have it report DB entries to the metrics service . do you have an opinion about this ? |
Hmm, on second thought perhaps this one should be left for a follow-up PR since it involves adding stuff to the directory. I'm wondering if it would make more sense to address this sub-item from #941 first:
|
i'll open a seperate pr for this, and then the db_metrics can be worked on after db-service get merged |
b4df0a5 to
a5fd921
Compare
34ab05c to
70f2f5c
Compare
spacebear21
left a comment
There was a problem hiding this comment.
cACK, I have minor implementation and code change requests but this looks good to merge otherwise.
This pr addresses payjoin#1284 and wil be built on to reflect the discussion on https://github.com/orgs/payjoin/discussions/775 Request passes through the axum middleware and metrics is collected. This currently tracks http request by endpoint type, method and status. and also tracks active and total connections. Remove metrics collection from Payjoin-directory The metrics server is exposed on different interface fix broken test fix broken test fix broken test Fix broken test dude to lock file fix format issue fix format issue Refactor metrics Servie
70f2f5c to
2a0bffe
Compare
implemented |
spacebear21
left a comment
There was a problem hiding this comment.
tACK 2a0bffe tested metrics counters locally with basic GET connections to the directory homepage
This PR addresses #1284.
it introduces Prometheus metric collection as a standalone service for payjoin-service
This Pr will also implement most of the metrics discussed here https://github.com/orgs/payjoin/discussions/775)
Pull Request Checklist
Please confirm the following before requesting review:
AI
in the body of this PR.