Add a progressr-based progress bar for sampling.#1138
Add a progressr-based progress bar for sampling.#1138josswright wants to merge 23 commits intostan-dev:masterfrom
progressr-based progress bar for sampling.#1138Conversation
This uses the [`progressr`](https://progressr.futureverse.org/index.html) framework to enable a progress bar for sampling operations. By default, this replaces standard iteration messages, but not other informative messages produced during sampling. From a user point of view, this adds two arguments to the `$sample()` method: - `show_progress_bar`: Default = FALSE If TRUE, registers a progress bar via `progressr` and signals an update for every output line that matches "Iteration:". The user is responsible for registering those progress updates with an appropriate handler. - `suppress_iteration_messages`: Defaults to the value of show_progress_bar, but can also be set directly. If TRUE, disables display of output lines matching "Iteration:", while still causing the progress bar to update. This keeps all of Stan's other informative output, but just removes the superfluous iteration messages when using a progress bar. I've tried to keep all additions to the code as non-intrusive as possible. One minor decision I made was to pass the value of `refresh` from the `$sample()` method through to the `CmdStanProcs` class so that the progress bar can report and update the number of steps in the bar to the same 'scale' as the number of iterations. (Calling `$sample()` with `iter_sampling=8000` and `refresh=100` will result in 80 'ticks' of the progress bar, each increasing the progress by 100.) By default, `progressr` doesn't register a handler to display the progress bar, as this is the responsibility of the user. Multiple packages can be used to display the progress bar. A default progress bar can be registered using the [`cli`](https://cli.r-lib.org/) library in this way: ```r library(progressr) library(cli) handlers( global=TRUE ) handlers("cli") options( cli.spinner = "moon", cli.progress_show_after = 0, cli.progress_clear = FALSE ) handlers( handler_cli( format = "{cli::pb_spin} Progress: |{cli::pb_bar}| {cli::pb_current}/{cli::pb_total} | {cli::pb_percent} | ETA: {cli::pb_eta}", clear = FALSE )) ``` A variety of alternative progress bar handlers are available, including audible and notification-based handlers, and ones that interact with RStudio's jobs pane: <https://progressr.futureverse.org/articles/progressr-11-handlers.html>
In addition to updating the progress bar on stdout lines that match "Iteration:", the CmdStanProcs object(s) now pass the current stdout line to the progress bar as a message for _every_ line, even if the number of completed iterations hasn't increased.
Added a `register_default_progress_handler()` function to `utils.R` that creates a default progress bar for sampling operations and registers it as global handler for progressr. Requires `cli` and `progressr`.
Was accidentally passing `refresh` to each informational progress bar update, rather than the calculated `progress_amount`. This caused each message line to update the progress bar, even if it didn't match an iteration message.
Moved code to close the progress bar to after the `check_finished()` function.
The nature of CmdStan's output makes pulling appropriate refresh and update values for the progress bar slightly tricky. In the original verison of the code, this led to the bar occasionally updating past the point where it was full, or terminating early due to an incorrectly calculated number of steps. This commit reworks the calculations of updates and refresh values for the progress bar, as well as adding some extra conditions to avoid the bar potentially crashing in odd scenarios.
New arguments for `iter_warmup` and `iter_sampling` were not being assigned a default value if not specified in sampling statement.
|
Looks like some tests are failing because the lower bound for iter_sampling should be 0 instead of 1. |
Great. Thanks for this. (First pull request for me on a project like this, so I was a little unfamiliar with the process and running the unit tests.) I've fixed the initial errors, and a couple of other warnings. Now running the tests and documentation generation commands locally before I push an update. |
- `iter_{warmup,sampling}` arguments to MCMCProcs are allowed to be 0.
- Documentation for `register_default_progress_handler` added to
repository.
- Ensured that calls to `progressr` functions are appropriately
prefixed.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1138 +/- ##
==========================================
- Coverage 87.97% 87.46% -0.52%
==========================================
Files 14 14
Lines 5937 6030 +93
==========================================
+ Hits 5223 5274 +51
- Misses 714 756 +42 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Added documentation for `show_progress_bar` and `suppress_iteration_messages` in `sample()` function. Changed `require()` calls for `progressr` and `cli` to suggested `requireNamespace()` alternatives.
|
Thanks for the help with the unit tests. In the spirit of peer review, some notes and thoughts:
|
|
Awesome, thank you for the PR! I think I have the flu right now, but I will take a look when I'm feeling better and caught up on work. |
No rush on my side! Hope you feel better soon. |
|
I haven't forgotten about this, just dealing with a backlog of work. In the meantime, I posted on the Stan forum to hopefully get some more people to try it out and let us know how it goes: https://discourse.mc-stan.org/t/please-help-test-a-new-progress-bar-for-sampling-in-cmdstanr/40946
Should we maybe add something to the documentation about this? At least while the progress bar is still new we'll be gathering info about whether any users suffer this fate, so we could warn users that it's an experimental feature and to be cautious using it. Then eventually we can remove the experimental caveat from the doc. What do you think @josswright @VisruthSK? |
|
The changes to I haven't yet looked at the procs code. I assume most of my review comments will be related to the procs code in |
I'd be happy with that. I do think I'm probably just being overly cautious, based on having prodded it quite hard to try and break it, but I don't see any harm in marking it as experimental until more people have had a chance to use it in anger. (Apologies for slow replies--hectic part of term right now.) |
There was a problem hiding this comment.
Thanks again for working on this! I just added a first round of review comments. And here are also some responses to some of your comments above:
In the spirit of peer review, some notes and thoughts:
- I'm not entirely happy with
CmdStan's defaultiter_warmupanditer_samplingvalues being hard-coded intoMCMCProcsdefault call rather than derived somehow. I imagine that the default values are fairly fixed, but I would have liked not to rely on that. Every route I could see to getting them was clunky and brought in unnecessary code.
Yeah not ideal, but this is tricky. I'll think about it but we may be stuck with this.
- As mentioned above, I toyed with the idea of setting
refreshto 1 by default when showing the progress bar with iteration messages suppressed. In the end that seemed to introduce yet another case of a parameter depending on another parameter, as well as a potential source of fragility. I'll note, though, thatrefreshcan safely be set to 1 for a nice, smooth progress bar without any noticeable overhead or slowdown that I saw in testing.
When I test refresh = 1 it's nice and smooth but it seems to add a lot of overhead. Here's what I tried along with system.time output:
system.time({
cmdstanr_example(show_progress_bar = TRUE)
})
user system elapsed
3.235 0.982 4.106
system.time({
fit <- cmdstanr_example(show_progress_bar = TRUE, refresh = 1)
})
user system elapsed
182.875 13.757 199.954
Drastic difference! Are you not seeing that on your end? Maybe I'm doing something wrong. This is also all with the default progress bar. Maybe other ones have less overhead?
- The code to derive the number of steps for the progress bar is more complicated than I originally imagined because of how
CmdStanreports iterations.
I put a few questions/comments about this in the review comments.
I've tested as extensively as I can, but I live in fear of this killing someone's multi-day sampling run.
I haven't yet gotten around to thinking about what tests we should add to the unit tests. But we should definitely add some that will run automatically whenever our unit tests are run.
|
|
||
| # Update progress bar if the iteration is a multiple of the | ||
| # refresh rate, or is the final sampling iteration. | ||
| if (((iter_current %% private$refresh_) == 0) || |
There was a problem hiding this comment.
If refresh=0 then I think iter_current %% 0 will return NaN.
More generally, what is the expected behavior of the progress bar if refresh=0? refresh = 0 is allowed and pretty common, but we could consider not allowing it when using the progress bar.
There was a problem hiding this comment.
It feels like refresh=0 should be mutually exclusive with the progress bar. I'd probably want to throw an error for that case.
| private$num_procs_ | ||
| }, | ||
| iter_warmup = function() { | ||
| privatea$iter_warmup_ |
There was a problem hiding this comment.
| privatea$iter_warmup_ | |
| private$iter_warmup_ |
Is this used anywhere? I guess not since that typo should have caused an error?
| iter_current <- as.numeric(gsub(".*Iteration:\\s*([0-9]+) \\/.*", "\\1", line, perl = TRUE)) | ||
| if (iter_current > private$iter_warmup_) { | ||
| iter_current <- iter_current - private$iter_warmup_ | ||
| } |
There was a problem hiding this comment.
Question about the logic here (and below). Is my understanding here correct?
- We parse Iteration: X / ... from CmdStan output.
- If X > iter_warmup we subtract warmup, so iter_current becomes sampling-only (e.g., 1..iter_sampling)
- But then we check the final step (this happens in the code below, not these highlighted lines) with iter_current == iter_warmup + iter_sampling, so then we're now back on the total iterations scale.
So during sampling, can that final step condition ever actually be true? I'm not entirely confident in my analysis here, so correct me if I'm wrong, but here's a simple example (not very realistic, but just to demonstrate). Suppose:
- iter_warmup = 100
- iter_sampling = 70
- Total iterations per chain = 170
- refresh = 50
Then at the final line (Iteration: 170 / 170):
- We would subtract warmup: iter_current = 170 - 100 = 70
- Then check 70 == 170 (false)
- 70 %% 50 != 0, so no progress update happens for the final chunk?
| progress_amount <- private$refresh_ | ||
| } | ||
| } | ||
| private$progress_bar_(amount = progress_amount, message = line) |
There was a problem hiding this comment.
If progress_amount is refresh then updates always add refresh, but the final chunk could be smaller than refresh (a remainder), right? Could this cause any issues?
| } | ||
| # Ensure at this point that any created progress bar is closed. | ||
| if (!is.null(private$progress_bar_)) { | ||
| private$progress_bar_(type = "finish") |
There was a problem hiding this comment.
Should this be procs$progress_bar()(type = "finish")? Does private$progress_bar exists in the CmdStanRun object or just in CmdStanProcs?
| #' display sampling progress via the `progressr` framework. The user is | ||
| #' responsible for registering a handler to display the progress bar. A | ||
| #' default handler, using the `cli` package, can be registered via | ||
| #' [register_default_progress_handler()]. The default is `FALSE`. |
There was a problem hiding this comment.
I think in the doc we should:
- advise that this is an experimental feature (for now)
- mention that the progress bar combines all chains due to limitations of progressr (I'm guessing this will be a common request/question otherwise)
There was a problem hiding this comment.
It's a common request in progressr as well!
One of the main blockers seems to be that RStudio's terminal only supports the necessary ANSI sequences to allow single-line progress bars, making it not worth the time to add multi-line bars as a feature.
There was a problem hiding this comment.
One other thing for the doc: an example of configuring a non-default progress bar
| progressr::handlers(global = TRUE) | ||
| progressr::handlers("cli") | ||
|
|
||
| options( | ||
| cli.spinner = "moon", | ||
| cli.progress_show_after = 0, | ||
| cli.progress_clear = FALSE | ||
| ) | ||
|
|
||
| # Default informative progress output for sampling | ||
| progressr::handlers( | ||
| progressr::handler_cli( | ||
| format = "{cli::pb_spin} Progress: |{cli::pb_bar}| {cli::pb_current}/{cli::pb_total} | {cli::pb_percent} | ETA: {cli::pb_eta}", | ||
| clear = FALSE | ||
| ) | ||
| ) |
There was a problem hiding this comment.
It just occurred to me that these settings will affect things people do outside of CmdStanR too, right? I would think it would be very rare that this would cause a problem for someone, but if we change global settings we should document it.
Typically I would approach something like this by restoring the options to their previous state before exiting the function, but that doesn't make sense here. The user would then need to call this function before every sampling run instead of just once. So documenting this seems sufficient. Maybe something like this (if what I wrote is actually correct):
`register_default_progress_handler()` modifies global progress settings for the current R session. It installs a global progressr handler and sets cli progress options, which may change progress display behavior in other packages.
Another option (in addition to the doc) would be to add a function the user can use to restore the settings.
@VisruthSK any thoughts on this?
There was a problem hiding this comment.
I think a succinct suppressable message + documentation would suffice; I do agree that doing something is necessary. I think a function to reset would be nice to have but not neccessary, unlike the doc/message.
| save_metric = NULL, | ||
| save_cmdstan_config = NULL) { | ||
| save_cmdstan_config = NULL, | ||
| show_progress_bar = FALSE, |
There was a problem hiding this comment.
What do you think about a global option that could be used to turn this on for a whole R session? For example,
show_progress_bar = getOption("cmdstanr_progress_bar", FALSE)
|
|
Forgot to say that this is awesome! |
I think this would be great, but we don't need to wait for that for this PR. It could be done separately and even potentially by someone else (using the same approach), depending on how much time @josswright has available to dedicate to this. |
I'd be happy to have a look, although I wouldn't want to give any guarantees about how much time I'm likely to have in the near future. In that case, though, it would almost certainly be worth abstracting some of the iteration extraction and calculation code, though. |
Ah, this was my fault in how I tested--I was running against a GP model, for which each iteration was relatively chunky in itself. I can confirm that using
Given this, I wonder if it might be worth throwing a warning if |
Since it could make sense to set |
Submission Checklist
Summary
As discussed in #18 , this pull request adds a progress bar to CmdStanR's
sample()function via theprogressrframework.This can be enabled via two new options to
model$sample(...):show_progress_bar- boolean - defaultFALSEEnables the progress bar.
suppress_iteration_messages- boolean - defaultFALSEifshow_progress_barisFALSE, andTRUEifshow_progress_barisTRUE.Causes any output lines reporting an iteration (such as
Chain 1 Iteration 1 / 2000 (Warmup)) to be suppressed.For the progress bar to be displayed, a progress bar handler must be created and registered by the end user. (See: https://progressr.futureverse.org/) For ease of use, a
register_default_progress_handler()function has been added. Callingregister_default_progress_handler()before amodel$sample(...)call will spawn a default progress bar via the cli library.Copyright and Licensing
Please list the copyright holder for the work you are submitting
(this will be you or your assignee, such as a university or company):
Joss Wright, University of Oxford
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: