-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathusage.qmd
More file actions
246 lines (194 loc) · 12.2 KB
/
usage.qmd
File metadata and controls
246 lines (194 loc) · 12.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
---
title: "Real-World Use & Statistics"
description: "The R community embraces the future framework"
preview: images/site_preview.png
format: html
---
# Real-World Use
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
If we look at our two main R package repositories,
[CRAN](https://cran.r-project.org/) and
[Bioconductor](https://bioconductor.org), we find that the future
framework is used by R packages spanning a wide range of areas,
e.g. statistics, modeling & prediction, time-series analysis &
forecasting, life sciences, drug analysis, clinical trials, disease
modeling, cancer research, computational biology, genomics,
bioinformatics, biomarker discovery, epidemiology, ecology, economics
& finance, spatial, geospatial & satellite analysis, and natural
language processing. That is just a sample based on published R
packages - we can only guess how futures are used by users at the R
prompt, in users' R scripts, non-published R packages, Shiny
applications, and R pipelines running internally in the industry and
academia.
There are two major use cases of the future framework: (i) performance
improvement through parallelization, and (ii) non-blocking,
asynchronous user experience (UX). Below are some prominent examples.
More examples can be found on R-university, which lists [~1,500 CRAN
packages that need the **future**
package](https://r-universe.dev/search?q=needs%3Afuture).
## EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters
{
fig-alt="Screenshot of the COVID-19 website dashboard with a world map annotated with colors indicating the trend of COVID infections in different regions"
fig-align="center"
style="margin-left: 2ex; border: solid 1px gray;"
width=80%
}
<!--
<figcaption style="font-size: 50%;">Image credit: EpiNow2 team</figcaption>
-->
[**EpiNow2**](https://epiforecasts.io/EpiNow2/) is an R package to
estimate real-time case counts and time-varying epidemiological
parameters, such as [current trends of COVID-19
incidents](https://epiforecasts.io/covid/) in different regions around
the globe.
**EpiNow2** uses futures to speed up processing. The future framework
is used to estimate incident rates in different regions concurrently
as well as running Markov Chain Monte Carlo (MCMC) in parallel.
## Seurat: Large-Scale Single-Cell Genomics
{
fig-alt="A two-dimensional, UMAP-space, scatter plot displaying individual cells grouped into 23 well-separated subclasses that are color and label annotated."
fig-align="center"
style="margin-left: 2ex; border: solid 1px gray;"
width=80%
}
<!--
<figcaption style="font-size: 50%;">Image credit: Seurat team</figcaption>
-->
[**Seurat**](https://satijalab.org/seurat/) is an R package designed
for QC, analysis, and exploration of single-cell RNA-seq data. Seurat
aims to enable users to identify and interpret sources of
heterogeneity from single-cell transcriptomic measurements, and to
integrate diverse types of single-cell
data. [**Azimuth**](https://satijalab.org/azimuth/) is a Seurat-based
web application, e.g. [HuBMAP - NIH Human Biomolecular Atlas
Project](https://azimuth.hubmapconsortium.org/)
**Seurat** uses futures to speed up processing. The future framework
makes it possible to process large data sets and large number of
samples in parallel on the local machine, distributed on multiple
machines, or via large-scale high-performance compute (HPC)
environments. **Azimuth** uses futures to provide a non-blocking web
interface.
## Shiny: Scalable, Asynchronous UX
{
fig-alt="Thumbnail of the Shiny ICGC Genome Browser webpage. There is a title banner on top above two panels. In the left-hand side panel, there is a circular plot showing the 24 human chromosomes laid out on the circumference. Interacting genes are connected with edges, creating a web of connections across the plane of the circle but also short loops back to the same chromosome. In the right-hand side panel, there is a table that appears to list the genes of interest with some kind of values."
style="margin-left: 2ex; border: solid 1px gray;"
fig-align="center"
width=80%
}
<!--
<figcaption style="font-size: 50%;">Image credit: International Cancer Genome Consortium (ICGC) team</figcaption>
-->
[**Shiny**](https://shiny.rstudio.com/) is an R package that makes it
easy to build interactive web applications and dashboards directly
from R. Shiny apps can run locally, be embedded in an R Markdown
document, and be hosted on a webpage - all with a few clicks or
commands. The combination of being simple and powerful has made Shiny
the most popular solution for web applications in the R community.
See the [Shiny Gallery](https://shiny.rstudio.com/gallery/) for
real-world examples, e.g. the [Genome
Browser](https://shiny.rstudio.com/gallery/genome-browser.html) by the
International Cancer Genome Consortium (ICGC) team.
**Shiny** uses the future framework to provide a non-blocking user
interface and to scale up computationally heavy requests. It combines
**future** with **promises** to turn a blocking, synchronous web
interface into a non-blocking, asynchronous, responsive user
experience.
## Mlr3: Next-Generation Machine Learning
{
fig-alt="A schematic outline of a ML pipeline. On top, there is a left-to-right pipeline with 'Training Data' as input, with steps 'Scaling', 'Factor Encoding', 'Median Imputation', and a final 'Learner' state. At the bottom, there is a similar pipeline but with 'New Data' as the input. In each of the corresponding steps, there is a arrow coming from the top pipeline indicating pre-learned parameters. After the 'New Data' has flowed through all steps, the output is a 'Prediction'."
style="margin-left: 2ex; margin-right: 2ex; border: solid 1px gray;"
fig-align="center"
width=80%
}
<!--
<figcaption style="font-size: 50%;">Image credit: ml3r team</figcaption>
-->
The [**mlr3**](https://mlr3.mlr-org.com/) ecosystem provides
efficient, object-oriented building blocks for machine learning (ML)
for tasks, learners, resamplings, and measures. It supports
large-scale, out-of-memory data processing.
**mlr3** uses futures to speed up processing. The future framework is
used in different ML steps, e.g. resampling of learners can be
performed much faster when run in parallel. The framework makes sure
proper parallel random-number generation (RNG) is used and guarantees
reproducible results.
## Targets: Pipeline Toolkit for Reproducible Computation at Scale
{
fig-alt="A drake dependency graph with a file 'raw_data_x.xlsx' node to the left, that a 'raw_data' node depends on, which in turn two nodes 'fit' and 'hist' depends on. The following 'report' node depend on the latter two nodes, and the last is the file 'report.html' output node. There is a legend to the left explaining how the states of the nodes are represented as colors and shapes."
style="margin-left: 2ex; margin-right: 2ex; border: solid 1px gray;"
fig-align="center"
width=80%
}
<!--
<figcaption style="font-size: 50%;">Image credit: targets/drake team</figcaption>
-->
The [**targets**](https://docs.ropensci.org/targets/) package, and its
predecessor [**drake**](https://docs.ropensci.org/drake/), is a
general-purpose computational engine for statistics and data science
that brings together function-oriented programming in R with make-like
declarative workflows. It has native support for parallel and
distributed computing while preserving reproducibility.
Both **targets** and **drake** identify targets in the declared
dependency graph that can be resolved concurrently, which then can be
processed in parallel on the local computer or distributed in the
cloud via the future framework.
# CRAN Statistics
Since the first CRAN release of **future** in June 2015, its uptake
among end-users and package developers has grown steadily. During
January 2026, **future** was among the top-0.7% most downloaded packages
on CRAN (Figure 1) and there are 480 packages on CRAN and Bioconductor
that directly depend on it (Figure 2). For map-reduce parallelization
packages **future.apply** (top-0.7% most downloaded) and **furrr**
(top 1.5%), the corresponding number of packages are 260 and 180,
respectively. If we consider recursive dependencies too, that is,
packages that use the **future** package either directly or indirectly
via another package, then 85% of all ~23,000 CRAN packages may rely on
the future framework for their processing.
<!--
pkgs <- revdepcheck:::cran_revdeps("future")
pkgs <- tools::package_dependencies("future", which="all", reverse=TRUE, recursive=TRUE)
-->
{
fig-alt="A line graph with 'Date' on the horizontal axis and 'Download rates on CRAN (four-week averages)' on the vertical axis. The dates goes from mid 2015 to 2025 and the ranks for 0 to 20%. Lines for package 'foreach', 'future', 'future.apply', and 'furrr' are displayed in different colors. The foreach curve is the highest but decreases slowly, whereas the other three are rapidly increasing toward the level of foreach."
fig-align="center"
width=70%
}
_Figure 1: The download percentile ranks for <strong>future</strong>,
<strong>future.apply</strong>, <strong>furrr</strong>, and
<strong>foreach</strong> averaged every four weeks.
<strong>future</strong> is among the top-0.7% most downloaded packages
on CRAN. The data are based on the Posit CRAN mirror logs. There
are approximately 200 million package downloads per month from the
Posit CRAN mirror alone. Since none of the other CRAN mirrors
provide statistics, it is impossible to know the total amount of
package installations._
<!-- [https://cranlogs.r-pkg.org/downloads/total/last-month] -->
As a reference[^1], the popular **foreach**, released in 2009, was among
the top-0.9% most downloaded packages during the same period and it
has 1,300 reverse package dependencies on CRAN. The number of users
that download **future** has grown rapidly whereas the same number
has slowly decreased for the **foreach** package (Figure 1).
Similarly, the number of reverse package dependencies on **future**
appear to grow faster than for **foreach** (Figure 2).
::: {layout-ncol=2}
{
fig-alt="A line chart showing the growth in the number of reverse dependencies on CRAN for three R packages, 'future', 'future.apply', and 'furrr', from 2015 to 2025. The y-axis ranges from 0 to 500 dependencies. The 'future' package (green) rises steeply after 2018 and 470 by end of 2025; 'future.apply' (blue) grows steadily to 250; and 'furrr' (purple) follows a similar but slightly slower trajectory, ending near 180."
}
{
fig-alt="A logarithmic-scale line chart comparing CRAN reverse-dependency growth for 'foreach', 'future', 'future.apply', and 'furrr' from 2015 to 2025. The y-axis spans roughly 10 to 1500 dependencies. 'foreach' (olive) leads throughout, rising smoothly from 150 to over 1000. 'future' (green) increases from a few dependencies in 2015 to 470 by end of 2025. 'future.apply' (blue) and 'furrr' (purple) start around 2018 and reach 250 and 180, respectively."
}
:::
_Figure 2: Number of CRAN[^2] packages over time that depend on **future**, **future.apply**, **furrr**, and **foreach** since the first release of **future** in June 2015.
Left: The package counts on the linear scale without **foreach**.
Right: The same data on the logarithmic scale to fit also **foreach**._
[^1]: Importantly, the comparison toward **foreach** is only done as a
reference for the current demand for parallelization frameworks in R
and to show the rapid uptake of the future framework since its
release. It is not a competition because **foreach** can per design
be used in combination with the future framework via **doFuture**. The
choice between **foreach** with **doFuture**, **future.apply**, and
**furrr** is a matter of preference of coding style - they all rely on
futures for parallelization.
[^2]: Because historical data for reverse dependencies on Bioconductor are hard to track down, Bioconductor packages are _not_ included in these graphs.