Skip to content

Add standalone metrics service to Payjoin-service#1296

Merged
spacebear21 merged 1 commit intopayjoin:masterfrom
zealsham:metrics-service
Feb 5, 2026
Merged

Add standalone metrics service to Payjoin-service#1296
spacebear21 merged 1 commit intopayjoin:masterfrom
zealsham:metrics-service

Conversation

@zealsham
Copy link
Copy Markdown
Collaborator

@zealsham zealsham commented Jan 23, 2026

This PR addresses #1284.

it introduces Prometheus metric collection as a standalone service for payjoin-service
This Pr will also implement most of the metrics discussed here https://github.com/orgs/payjoin/discussions/775)

Pull Request Checklist

Please confirm the following before requesting review:

@zealsham zealsham marked this pull request as draft January 23, 2026 11:22
@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Jan 23, 2026

Pull Request Test Coverage Report for Build 21691748627

Details

  • 138 of 157 (87.9%) changed or added relevant lines in 7 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.4%) to 83.649%

Changes Missing Coverage Covered Lines Changed/Added Lines %
payjoin-directory/src/main.rs 0 1 0.0%
payjoin-service/src/metrics.rs 34 35 97.14%
payjoin-service/src/config.rs 0 4 0.0%
payjoin-service/src/lib.rs 69 82 84.15%
Totals Coverage Status
Change from base Build 21675282596: 0.4%
Covered Lines: 10232
Relevant Lines: 12232

💛 - Coveralls

@zealsham zealsham force-pushed the metrics-service branch 2 times, most recently from 52a17f3 to 192c5b5 Compare January 24, 2026 11:38
@zealsham zealsham changed the title WIP:Add standalone metrics service to Payjoin-service Add standalone metrics service to Payjoin-service Jan 24, 2026
@zealsham zealsham marked this pull request as ready for review January 24, 2026 12:52
Copy link
Copy Markdown
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick PR! Besides the review comments below I have two other requests:

  • Let's squash the two commits so that the first one doesn't fail CI
  • IMO there are two additional metrics described in #1284 that would be really valuable to add here. Both of these are give us a much better overview of actual usage:
    • number of entries in DB (v1 vs v2)
    • unique shortid counters for a given time period. Extra points for the ability to select a time frame.

Comment thread payjoin-service/src/lib.rs Outdated
Comment on lines +77 to +78
.layer(axum::middleware::from_fn_with_state(metrics.clone(), track_metrics))
.layer(axum::middleware::from_fn_with_state(metrics.clone(), track_connections))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axum docs recommend using ServiceBuilder to apply multiple middleware at once: https://docs.rs/axum/latest/axum/middleware/index.html#applying-multiple-middleware

Comment thread payjoin-service/src/middlewares.rs Outdated
Comment thread payjoin-service/src/lib.rs Outdated
Comment thread payjoin-service/src/lib.rs Outdated
Comment thread payjoin-service/src/metrics.rs Outdated
Comment thread payjoin-service/src/metrics.rs Outdated
Comment on lines +53 to +59
let status_type = match status_code {
200..=299 => "success",
300..=399 => "redirect",
400..=499 => "client_error",
500..=599 => "server_error",
_ => "unknown",
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lossy, we might as well save the exact status code and grouping could always be done when "encoding" at read time.

Comment thread payjoin-service/src/metrics.rs Outdated

// Total number of connections seen so far
let total_connections =
IntCounter::new("Total_connections", "Total number of connections")?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tracked metrics should be defined in constants instead of hardcoded. Also this one is capitalized, they should be consistently lower snake case. In general we should do our best to follow Prometheus naming best practices

Comment thread payjoin-service/src/metrics.rs Outdated
Comment thread payjoin-service/src/metrics.rs Outdated
Ok(body) => {
let response = Response::builder()
.status(StatusCode::OK)
.header("Content-Type", "text/plain; version=0.0.4; charset=utf-8")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread payjoin-service/src/middlewares.rs Outdated
Comment on lines +37 to +50
match (method, path) {
// Ohttp endpoints
("POST", "/.well-known/ohttp-gateway") => "ohttp_gateway",
("GET", "/.well-known/ohttp-gateway") => "ohttp_gateway_get",
("POST", "/") => "ohttp_gateway",
("GET", "/ohttp-keys") => "ohttp_keys",

("GET", "/health") => "health",
("GET", "/") => "directory_home",
("GET", "/metrics") => "metrics",

_ => "unknown",
}
.to_string()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same vein as https://github.com/payjoin/rust-payjoin/pull/1296/changes#r2724621450, we could store the whole path and method for losslessness, and group or sort things when reading instead of when writing.

This also makes me wonder if there's a way to correlate different metrics? For example say we are tracking lots of 400s and they are coming from "GET /ohttp-keys" requests, is there a way to identify that this is the problematic path? How might we achieve that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there's only a small number of paths each endpoint could be counted separately

time series databases should generally be able to aggregate such information but for convenience we could also have combined counters/histograms

that said, for the specific example you gave is more of a logging/observability concern than a metrics one, i.e. knowing that there are errors to look into makes sense as a metric but if you actually want to figure out the cause you probably want more than just the endpoint but the complete error

the tracing crate can emit structured logs with spans intact using opentracing, it shouldn't take much work to make that available... then with various log searching tools you can see aggregate error counts and break them down by the other request properties

Copy link
Copy Markdown
Contributor

@nothingmuch nothingmuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be better to only serve metrics on a separate interface until we've determined whether there's a good reason to serve specific ones publicly

secondly, the manual routing logic and tower service impl could probably be replaced with axum's router and response handling, wouldn't that be simpler? if that's already planned for a followup PR i think i would prefer if it was done here since this introduces some new code to translate from axum's concepts to the lower level code

@zealsham
Copy link
Copy Markdown
Collaborator Author

Thanks for the review @spacebear21 , i'll implement the changes

  • number of entries in DB (v1 vs v2)

my intended implementation is create a metrics reporter trait in the directory and have it report DB entries to the metrics service . do you have an opinion about this ?

@spacebear21
Copy link
Copy Markdown
Collaborator

my intended implementation is create a metrics reporter trait in the directory and have it report DB entries to the metrics service . do you have an opinion about this ?

Hmm, on second thought perhaps this one should be left for a follow-up PR since it involves adding stuff to the directory. I'm wondering if it would make more sense to address this sub-item from #941 first:

DB should also be a tower::Service, using tower::{timeout,limit::rate}

@zealsham
Copy link
Copy Markdown
Collaborator Author

my intended implementation is create a metrics reporter trait in the directory and have it report DB entries to the metrics service . do you have an opinion about this ?

Hmm, on second thought perhaps this one should be left for a follow-up PR since it involves adding stuff to the directory. I'm wondering if it would make more sense to address this sub-item from #941 first:

DB should also be a tower::Service, using tower::{timeout,limit::rate}

i'll open a seperate pr for this, and then the db_metrics can be worked on after db-service get merged

@zealsham zealsham force-pushed the metrics-service branch 7 times, most recently from b4df0a5 to a5fd921 Compare February 2, 2026 15:48
Copy link
Copy Markdown
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cACK, I have minor implementation and code change requests but this looks good to merge otherwise.

Comment thread payjoin-service/src/config.rs Outdated
Comment thread payjoin-service/src/config.rs Outdated
Comment thread payjoin-service/src/metrics.rs
Comment thread payjoin-service/src/metrics.rs Outdated
@spacebear21 spacebear21 mentioned this pull request Feb 4, 2026
29 tasks
This pr addresses  payjoin#1284 and wil  be built on to reflect the
discussion on https://github.com/orgs/payjoin/discussions/775
Request passes through the axum middleware and
metrics is collected.  This currently tracks
http request by endpoint type, method and status.
and also tracks active and total connections.
Remove metrics collection from Payjoin-directory

The metrics server is exposed on different interface

fix broken test

fix broken test

fix broken test

Fix broken test dude to lock file

fix format issue

fix format issue

Refactor metrics Servie
@zealsham
Copy link
Copy Markdown
Collaborator Author

zealsham commented Feb 5, 2026

cACK, I have minor implementation and code change requests but this looks good to merge otherwise.

implemented

Copy link
Copy Markdown
Collaborator

@spacebear21 spacebear21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 2a0bffe tested metrics counters locally with basic GET connections to the directory homepage

@spacebear21 spacebear21 merged commit 9c765dd into payjoin:master Feb 5, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants