Skip to content

Commit 45ec16f

Browse files
authored
Merge pull request #184 from pythonhealthdatascience/dev
Dev
2 parents 8035c79 + 11a887f commit 45ec16f

28 files changed

Lines changed: 618 additions & 283 deletions

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ authors:
3434
orcid: https://orcid.org/0009-0000-0252-560X
3535
- given-names: Rob
3636
family-names: Challen
37-
affiliation:
37+
affiliation: School of Engineering, Mathematics and Technology, University of Bristol
3838
orcid: https://orcid.org/0000-0002-5504-7768
3939
- given-names: Tom
4040
family-names: Slater

DESCRIPTION

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
Title: DES Rap Book
22
Imports:
33
cyclocomp,
4+
checkmate,
45
diffobj,
56
dplyr,
67
fitdistrplus,
@@ -12,6 +13,7 @@ Imports:
1213
kableExtra,
1314
lintr,
1415
lubridate,
16+
pak,
1517
patrick,
1618
plotly,
1719
prettycode,

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,13 @@ Check it out at: **https://pythonhealthdatascience.github.io/des_rap_book/**.
6060

6161
This resource has been developed as part of the project **STARS: Sharing Tools and Artefacts for Reproducible Simulations in healthcare**.
6262

63-
![](images/stars_banner.png)
63+
[![](images/stars_banner.png)](https://pythonhealthdatascience.github.io/stars/)
6464

6565
The project tackles the challenges of sharing, reusing, and reproducing discrete event simulation (DES) models in healthcare. Our goal is to create open resources using the two most popular open-source languages for DES: Python and R.
6666

6767
We have been developing tutorials, code examples, and tools to help researchers and practitioners develop, validate, and share DES models more effectively.
6868

69-
For more information on this project, check out the [STARS page](https://pythonhealthdatascience.github.io/des_rap_book/pages/project/stars.html) in the DES RAP Book.
69+
For more information on this project, check out the [STARS project website](https://pythonhealthdatascience.github.io/stars/).
7070

7171
<br>
7272

@@ -110,4 +110,4 @@ If you're interested in contributing (or just viewing this website locally), che
110110

111111
## Funding
112112

113-
This project is supported by the Medical Research Council [grant number [MR/Z503915/1](https://gtr.ukri.org/projects?ref=MR%2FZ503915%2F1)].
113+
This project is supported by the Medical Research Council [grant number [MR/Z503915/1](https://gtr.ukri.org/projects?ref=MR%2FZ503915%2F1)] from 1st May 2024 to 31st October 2026.

_quarto.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ website:
119119
The STARS project is supported by the Medical Research Council [grant number MR/Z503915/1].
120120
center:
121121
- text: |
122-
Part of the STARS research project.<br>
122+
Part of the <a href="https://pythonhealthdatascience.github.io/stars/" target="_blank" rel="noopener">STARS research project</a>.<br>
123123
Code licence: <a href="https://opensource.org/license/mit" target="_blank" rel="noopener">MIT</a>.
124124
Text licence: <a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="noopener">CC-BY-SA 4.0</a>.
125125
right:

index.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ The book is written by **Amy Heather** [![ORCID](images/orcid.png)](https://orci
9797
* Dr. **Rob Challen** [![ORCID](images/orcid.png)](https://orcid.org/0000-0002-5504-7768)
9898
* **Tom Slater** [![ORCID](images/orcid.png)](https://orcid.org/0009-0007-0838-7499)
9999

100-
The STARS project is supported by the Medical Research Council [grant number MR/Z503915/1]. The listed researchers are associated with the **University of Exeter** Medical and Business Schools.
100+
The STARS project is supported by the Medical Research Council [grant number MR/Z503915/1] from 1st May 2024 to 31st October 2026. The listed researchers are associated with the **University of Exeter** Medical and Business Schools, and the **University of Bristol** School of Engineering, Mathematics and Technology.
101101

102102
You can find out more about our project on the [**STARS project website**](https://pythonhealthdatascience.github.io/stars/){target="_blank"}. If you use this resource, **please cite us:**
103103

pages/guide/further_info/conclusion.qmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Remember, these are **examples**, not prescriptions. They're not perfect, and th
4444

4545
### Make your own model
4646

47-
The best way to solidify what you've learned is to apply it. When planning you model, remember that a good simulation starts with **conceptual modelling**. As defined in Robinson (2007):
47+
The best way to solidify what you've learned is to apply it. When planning your model, remember that a good simulation starts with **conceptual modelling**. As defined in Robinson (2007):
4848

4949
> "The conceptual model is a non-software specific description of the simulation model that is to be developed, describing the objectives, inputs, outputs, content, assumptions and simplifications of the model."
5050
@@ -103,9 +103,9 @@ Suggested citation:
103103
104104
## Find out more about STARS
105105

106-
This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council [grant number MR/Z503915/1].
106+
This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council [grant number MR/Z503915/1] from 1st May 2024 to 31st October 2026.
107107

108-
![](../../images/stars_banner.png)
108+
[![](../../images/stars_banner.png)](https://pythonhealthdatascience.github.io/stars/)
109109

110110
STARS tackles the challenges of sharing, reusing, and reproducing discrete event simulation (DES) models in healthcare. Our goal is to create open resources using the two most popular open-source languages for DES: Python and R. As part of this project, you'll find tutorials, code examples, and tools to help researchers and practitioners develop, validate, and share DES models more effectively.
111111

pages/guide/further_info/feedback.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,6 @@ Prefer email instead? Reach out to the STARS team - you can contact the followin
3939
* Alison Harper: [a.l.harper@exeter.ac.uk](mailto:a.l.harper@exeter.ac.uk)
4040
* Nav Mustafee: [n.mustafee@exeter.ac.uk](mailto:n.mustafee@exeter.ac.uk)
4141

42-
This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council.
42+
This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council from 1st May 2024 to 31st October 2026.
4343

4444
<br><br>

pages/guide/inputs/input_data.qmd

Lines changed: 62 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ date: "2025-10-13T15:40:55+01:00"
1616

1717
* Recognise where a **reproducible analytical pipeline** begins, and what data is included.
1818
* Learn recommended practices for **storing and sharing raw data, input modelling code, and parameters**.
19+
* Understand how to protect sensitive data and avoid committing secrets to version control.
1920
* Understand how **private and public versions** of a model could be maintained when there is sensitive data.
2021

2122
:::
@@ -43,9 +44,69 @@ Keep in mind that, especially in sensitive areas like healthcare, you may not be
4344

4445
> **Why is this important?** By starting at the source, you make your work transparent and easy to repeat. For instance, if new raw data becomes available, it's important you have your input modelling code so that you can check your distributions are still appropriate, re-estimate your model parameters, and re-run your analysis.
4546
47+
## Never commit secrets or sensitive data to Git
48+
49+
Never commit to Git:
50+
51+
* Raw identifiable data (patient records, personally identifiable information).
52+
* Secrets (API keys, passwords, database connection strings, access tokens).
53+
* Real sensitive parameter files that must remain private.
54+
* Configuration files with embedded credentials (e.g. .env, secrets.yml).
55+
56+
**Even if your repository is private, secrets in Git history are vulnerable**: anyone with current or future repository access can see the full history, and if the repository ever becomes public, secrets are exposed to the internet. **Removing secrets from Git history is painful and error-prone, so prevention is far easier than cure**.
57+
58+
### Protect your repository
59+
60+
You need to **store sensitive data outside your public Git repository**. You have two main approaches:
61+
62+
1. **Completely outside any Git repository**. Store raw data, secrets, and real sensitive parameters in a secure location entirely separate from version control, such as in a restricted database or institutional data source. This approach is safest when your data requires strict separation and storage rules. Reference these resources in your RAP via documented access paths (e.g. database connection details, file paths) rather than committing exports.
63+
64+
2. **In a private Git repository**. For less sensitive data, use a separate private repository (never mix public and private content in the same repository). The advantage is that your team maintains version control and change history for sensitive materials. However, a private repository is not foolproof - it is only as secure as your team's access permissions, and if the repository's access ever changes, sensitive data could be exposed.
65+
66+
::: {.callout-note title="Learn more about maintaining a private and public version of your model" collapse="true"}
67+
68+
The way you might set up private and public repositories depends on whether you are allowed to share the real simulation parameter files. This assumes your sensitive data is stored using one of the approaches above (either completely outside Git, or in a private repository if appropriate).
69+
70+
### Scenario 1: Allowed to share real parameters
71+
72+
* **Public repository:** Contains everything needed to reproduce your model except sensitive raw data and input modelling scripts.
73+
74+
* **Private repository (or secure external storage):** Contains the sensitive raw data, input modelling scripts, and anything else that cannot be publicly released.
75+
76+
* **Workflow:**
77+
1. Do all input modelling (parameter estimation) on real data stored securely (either in a private repository or external to Git).
78+
2. Copy the resulting real parameter files to the public repository.
79+
3. Run your model and share code/results publicly - users can fully reproduce your analysis using the real parameters.
80+
81+
### Scenario 2: Only sharing fake/synthetic parameters
82+
83+
* **Public repository:** Contains only synthetic/fake parameter files, synthetic/example data, analysis code, and documentation describing how these synthetic values were generated.
84+
85+
* **Private repository (or secure external storage):** Contains the sensitive raw data and real parameter files, plus scripts for analysis with the real values.
86+
87+
* **Shared simulation package (in it's own repository or part of the public repository):** All analysis code is [developed as a package](../setup/package.qmd) that can be installed and used by both the public and private repositories. This greatly reduces code duplication.
88+
89+
* **Workflow:**
90+
1. Estimate parameters using real data stored securely (either in a private repository if appropriate, or completely outside Git for highly sensitive data).
91+
2. Generate synthetic parameter files for the public repository, documenting the generation process.
92+
3. Use the shared simulation package in both repositories.
93+
4. Run and share the full workflow in public with synthetic parameters; run the actual analysis in private with the real parameters.
94+
95+
:::
96+
97+
Some additional safeguards to protect your repository include to:
98+
99+
* Add sensitive files to `.gitignore`.
100+
* Store secrets using environment variables or a secrets manager.
101+
* Use pre-commit secret scanners (e.g., `git-secrets`, `detect-secrets`) to block commits with secrets.
102+
103+
### If you accidentally commit sensitive data
104+
105+
See [GitHub's guide to removing sensitive data](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository) for advice on what to do!
106+
46107
## Raw data
47108

48-
This is data which reflects system you will be simulating. It is used to estimate parameters and fit distributions for your simulation model. For example:
109+
This is data which reflects the system you will be simulating. It is used to estimate parameters and fit distributions for your simulation model. For example:
49110

50111
::: {.grey-table}
51112

@@ -205,37 +266,6 @@ You must share some parameters with your model so that it is possible for others
205266

206267
:::
207268

208-
## Maintaining a private and public version of your model
209-
210-
It's common to have data and/or code that cannot be shared publicly. **Both your private and public components should be [version controlled](../setup/version.qmd)**, but you cannot split a single GitHub repository into public and private sections. The suggested solution is to use two separate repositories: **one public, one private**.
211-
212-
The way you might set these up depends on whether you are allowed to share the real simulation parameter files.
213-
214-
### Scenario 1: Allowed to share real parameters
215-
216-
* **Public repository:** Contains everything needed to reproduce your model except sensitive raw data and input modelling scripts.
217-
218-
* **Private repository:** Contains the sensitive raw data, input modelling scripts, and anything else that cannot be publicly released.
219-
220-
* **Workflow:**
221-
1. Do all input modelling (parameter estimation) on real data in your private repository.
222-
2. Copy the resulting real parameter files to the public repository.
223-
3. Run your model and share code/results publicly - users can fully reproduce your analysis using the real parameters.
224-
225-
### Scenario 2: Only sharing fake/synthetic parameters
226-
227-
* **Public repository:** Contains only synthetic/fake parameter files, synthetic/example data, analysis code, and documentation describing how these synthetic values were generated.
228-
229-
* **Private repository:** Contains the sensitive raw data and real parameter files, plus scripts for analysis with the real values.
230-
231-
* **Shared simulation package (in it's own repository or part of the public repository):** All analysis code is [developed as a package](../setup/package.qmd) that can be installed and used by both the public and private repositories. This greatly reduces code duplication.
232-
233-
* **Workflow:**
234-
1. Estimate parameters using real data in your private repository - store these securely.
235-
2. Generate synthetic parameter files for the public repository, documenting the generation process.
236-
3. Use the shared simulation package in both repositories.
237-
4. Run and share the full workflow in public with synthetic parameters; run the actual analysis in private with the real parameters.
238-
239269
## Test yourself
240270

241271
```{r}

0 commit comments

Comments
 (0)