For now, I'd take the current ODE Bayesian Parameter estimation benchmarks offline and stop quoting them as up-to date evidence that Julia is currently 3-5 times faster than Stan for ODE problems. If I knew Julia better I'd fix them, but I don't, so here are the issues with them that I currently know:
- The Lorenz benchmark fails to sample from the correct posterior for either library
- The Lorenz benchmark uses 500+1000 warmup+samples for Turing, but 1000+1000 for Stan
- The Lotka-Volterra benchmark samples from two different posteriors (
parameters = sigma1.1, sigma1.2, theta_1, theta_2, theta_3, theta_4 for Stan and parameters = theta[1], theta[2], theta[3], theta[4], σ[1] for Turing)
- Neither benchmark makes it clear what Stan's actual sampling time is, because the time reported through @Btime includes compilation
- DynamicHMC only reports runtime and nothing else that is relevant
- AFAIK, in the benchmarks Stan always uses
adapt_delta=.85 (its default) while Turing uses "adapt_delta=.65", which potentially lowers runtime but also ESS
- Runtime isn't a good metric anyways, I would probably display per parameter ESS/s histograms/boxplots over several runs with less (post warm-up) samples
- The interesting problems with ODE models tend to only come up with non-toy problems, but they also tend to take much longer to run, making it difficult (I assume) to include them in automatic benchmarks
For now, I'd take the current ODE Bayesian Parameter estimation benchmarks offline and stop quoting them as up-to date evidence that Julia is currently 3-5 times faster than Stan for ODE problems. If I knew Julia better I'd fix them, but I don't, so here are the issues with them that I currently know:
parameters = sigma1.1, sigma1.2, theta_1, theta_2, theta_3, theta_4for Stan andparameters = theta[1], theta[2], theta[3], theta[4], σ[1]for Turing)adapt_delta=.85(its default) while Turing uses "adapt_delta=.65", which potentially lowers runtime but also ESS