pythonhealthdatascience
diff --git a/‎CITATION.cff‎
Lines changed: 1 addition & 1 deletion b/‎CITATION.cff‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎DESCRIPTION‎
Lines changed: 2 additions & 0 deletions b/‎DESCRIPTION‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎_quarto.yml‎
Lines changed: 1 addition & 1 deletion b/‎_quarto.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎index.qmd‎
Lines changed: 1 addition & 1 deletion b/‎index.qmd‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pages/guide/further_info/conclusion.qmd‎
Lines changed: 3 additions & 3 deletions b/‎pages/guide/further_info/conclusion.qmd‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎pages/guide/further_info/feedback.qmd‎
Lines changed: 1 addition & 1 deletion b/‎pages/guide/further_info/feedback.qmd‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pages/guide/inputs/input_data.qmd‎
Lines changed: 62 additions & 32 deletions b/‎pages/guide/inputs/input_data.qmd‎
Lines changed: 62 additions & 32 deletions
@@ -34,7 +34,7 @@ authors:
     orcid: https://orcid.org/0009-0000-0252-560X
   - given-names: Rob
     family-names: Challen
-    affiliation:
+    affiliation: School of Engineering, Mathematics and Technology, University of Bristol
     orcid: https://orcid.org/0000-0002-5504-7768
   - given-names: Tom
     family-names: Slater
 
@@ -1,6 +1,7 @@
 Title: DES Rap Book
 Imports:
     cyclocomp,
+    checkmate,
     diffobj,
     dplyr,
     fitdistrplus,
@@ -12,6 +13,7 @@ Imports:
     kableExtra,
     lintr,
     lubridate,
+    pak,
     patrick,
     plotly,
     prettycode,
 
@@ -60,13 +60,13 @@ Check it out at: **https://pythonhealthdatascience.github.io/des_rap_book/**.
 
 This resource has been developed as part of the project **STARS: Sharing Tools and Artefacts for Reproducible Simulations in healthcare**.
 
-![](images/stars_banner.png)
+[![](images/stars_banner.png)](https://pythonhealthdatascience.github.io/stars/)
 
 The project tackles the challenges of sharing, reusing, and reproducing discrete event simulation (DES) models in healthcare. Our goal is to create open resources using the two most popular open-source languages for DES: Python and R.
 
 We have been developing tutorials, code examples, and tools to help researchers and practitioners develop, validate, and share DES models more effectively.
 
-For more information on this project, check out the [STARS page](https://pythonhealthdatascience.github.io/des_rap_book/pages/project/stars.html) in the DES RAP Book.
+For more information on this project, check out the [STARS project website](https://pythonhealthdatascience.github.io/stars/).
 
 <br>
 
@@ -110,4 +110,4 @@ If you're interested in contributing (or just viewing this website locally), che
 
 ## Funding
 
-This project is supported by the Medical Research Council [grant number [MR/Z503915/1](https://gtr.ukri.org/projects?ref=MR%2FZ503915%2F1)].
+This project is supported by the Medical Research Council [grant number [MR/Z503915/1](https://gtr.ukri.org/projects?ref=MR%2FZ503915%2F1)] from 1st May 2024 to 31st October 2026.
@@ -119,7 +119,7 @@ website:
            The STARS project is supported by the Medical Research Council [grant number MR/Z503915/1].
     center:
       - text: |
-          Part of the STARS research project.<br>
+          Part of the <a href="https://pythonhealthdatascience.github.io/stars/" target="_blank" rel="noopener">STARS research project</a>.<br>
           Code licence: <a href="https://opensource.org/license/mit" target="_blank" rel="noopener">MIT</a>.  
           Text licence: <a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="noopener">CC-BY-SA 4.0</a>.
     right:
 
@@ -97,7 +97,7 @@ The book is written by **Amy Heather** [![ORCID](images/orcid.png)](https://orci
 * Dr. **Rob Challen** [![ORCID](images/orcid.png)](https://orcid.org/0000-0002-5504-7768)
 * **Tom Slater** [![ORCID](images/orcid.png)](https://orcid.org/0009-0007-0838-7499)
 
-The STARS project is supported by the Medical Research Council [grant number MR/Z503915/1]. The listed researchers are associated with the **University of Exeter** Medical and Business Schools.
+The STARS project is supported by the Medical Research Council [grant number MR/Z503915/1] from 1st May 2024 to 31st October 2026. The listed researchers are associated with the **University of Exeter** Medical and Business Schools, and the **University of Bristol** School of Engineering, Mathematics and Technology.
 
 You can find out more about our project on the [**STARS project website**](https://pythonhealthdatascience.github.io/stars/){target="_blank"}. If you use this resource, **please cite us:**
 
 
@@ -44,7 +44,7 @@ Remember, these are **examples**, not prescriptions. They're not perfect, and th
 
 ### Make your own model
 
-The best way to solidify what you've learned is to apply it. When planning you model, remember that a good simulation starts with **conceptual modelling**. As defined in Robinson (2007):
+The best way to solidify what you've learned is to apply it. When planning your model, remember that a good simulation starts with **conceptual modelling**. As defined in Robinson (2007):
 
 > "The conceptual model is a non-software specific description of the simulation model that is to be developed, describing the objectives, inputs, outputs, content, assumptions and simplifications of the model."
 
@@ -103,9 +103,9 @@ Suggested citation:
 
 ## Find out more about STARS
 
-This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council [grant number MR/Z503915/1].
+This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council [grant number MR/Z503915/1] from 1st May 2024 to 31st October 2026.
 
-![](../../images/stars_banner.png)
+[![](../../images/stars_banner.png)](https://pythonhealthdatascience.github.io/stars/)
 
 STARS tackles the challenges of sharing, reusing, and reproducing discrete event simulation (DES) models in healthcare. Our goal is to create open resources using the two most popular open-source languages for DES: Python and R. As part of this project, you'll find tutorials, code examples, and tools to help researchers and practitioners develop, validate, and share DES models more effectively.
 
 
@@ -39,6 +39,6 @@ Prefer email instead? Reach out to the STARS team - you can contact the followin
 * Alison Harper: [a.l.harper@exeter.ac.uk](mailto:a.l.harper@exeter.ac.uk)
 * Nav Mustafee: [n.mustafee@exeter.ac.uk](mailto:n.mustafee@exeter.ac.uk)
 
-This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council.
+This book is part of the **STARS (Sharing Tools and Artefacts for Reusable and Reproducible Simulations)** project, supported by the Medical Research Council from 1st May 2024 to 31st October 2026.
 
 <br><br>
@@ -16,6 +16,7 @@ date: "2025-10-13T15:40:55+01:00"
 
 * Recognise where a **reproducible analytical pipeline** begins, and what data is included.
 * Learn recommended practices for **storing and sharing raw data, input modelling code, and parameters**.
+* Understand how to protect sensitive data and avoid committing secrets to version control.
 * Understand how **private and public versions** of a model could be maintained when there is sensitive data.
 
 :::
@@ -43,9 +44,69 @@ Keep in mind that, especially in sensitive areas like healthcare, you may not be
 
 > **Why is this important?** By starting at the source, you make your work transparent and easy to repeat. For instance, if new raw data becomes available, it's important you have your input modelling code so that you can check your distributions are still appropriate, re-estimate your model parameters, and re-run your analysis.
 
+## Never commit secrets or sensitive data to Git
+
+Never commit to Git:
+
+* Raw identifiable data (patient records, personally identifiable information).
+* Secrets (API keys, passwords, database connection strings, access tokens).
+* Real sensitive parameter files that must remain private.
+* Configuration files with embedded credentials (e.g. .env, secrets.yml).
+
+**Even if your repository is private, secrets in Git history are vulnerable**: anyone with current or future repository access can see the full history, and if the repository ever becomes public, secrets are exposed to the internet. **Removing secrets from Git history is painful and error-prone, so prevention is far easier than cure**.
+
+### Protect your repository
+
+You need to **store sensitive data outside your public Git repository**. You have two main approaches:
+
+1. **Completely outside any Git repository**. Store raw data, secrets, and real sensitive parameters in a secure location entirely separate from version control, such as in a restricted database or institutional data source. This approach is safest when your data requires strict separation and storage rules. Reference these resources in your RAP via documented access paths (e.g. database connection details, file paths) rather than committing exports.
+
+2. **In a private Git repository**. For less sensitive data, use a separate private repository (never mix public and private content in the same repository). The advantage is that your team maintains version control and change history for sensitive materials. However, a private repository is not foolproof - it is only as secure as your team's access permissions, and if the repository's access ever changes, sensitive data could be exposed.
+
+::: {.callout-note title="Learn more about maintaining a private and public version of your model" collapse="true"}
+
+The way you might set up private and public repositories depends on whether you are allowed to share the real simulation parameter files. This assumes your sensitive data is stored using one of the approaches above (either completely outside Git, or in a private repository if appropriate).
+
+### Scenario 1: Allowed to share real parameters
+
+* **Public repository:** Contains everything needed to reproduce your model except sensitive raw data and input modelling scripts.
+
+* **Private repository (or secure external storage):** Contains the sensitive raw data, input modelling scripts, and anything else that cannot be publicly released.
+
+* **Workflow:**
+  1. Do all input modelling (parameter estimation) on real data stored securely (either in a private repository or external to Git).
+  2. Copy the resulting real parameter files to the public repository.
+  3. Run your model and share code/results publicly - users can fully reproduce your analysis using the real parameters.
+
+### Scenario 2: Only sharing fake/synthetic parameters
+
+* **Public repository:** Contains only synthetic/fake parameter files, synthetic/example data, analysis code, and documentation describing how these synthetic values were generated.
+
+* **Private repository (or secure external storage):** Contains the sensitive raw data and real parameter files, plus scripts for analysis with the real values.
+
+* **Shared simulation package (in it's own repository or part of the public repository):** All analysis code is [developed as a package](../setup/package.qmd) that can be installed and used by both the public and private repositories. This greatly reduces code duplication.
+
+* **Workflow:**
+  1. Estimate parameters using real data stored securely (either in a private repository if appropriate, or completely outside Git for highly sensitive data).
+  2. Generate synthetic parameter files for the public repository, documenting the generation process.
+  3. Use the shared simulation package in both repositories.
+  4. Run and share the full workflow in public with synthetic parameters; run the actual analysis in private with the real parameters.
+
+:::
+
+Some additional safeguards to protect your repository include to:
+
+* Add sensitive files to `.gitignore`.
+* Store secrets using environment variables or a secrets manager.
+* Use pre-commit secret scanners (e.g., `git-secrets`, `detect-secrets`) to block commits with secrets.
+
+### If you accidentally commit sensitive data
+
+See [GitHub's guide to removing sensitive data](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository) for advice on what to do!
+
 ## Raw data
 
-This is data which reflects system you will be simulating. It is used to estimate parameters and fit distributions for your simulation model. For example:
+This is data which reflects the system you will be simulating. It is used to estimate parameters and fit distributions for your simulation model. For example:
 
 ::: {.grey-table}
 
@@ -205,37 +266,6 @@ You must share some parameters with your model so that it is possible for others
 
 :::
 
-## Maintaining a private and public version of your model
-
-It's common to have data and/or code that cannot be shared publicly. **Both your private and public components should be [version controlled](../setup/version.qmd)**, but you cannot split a single GitHub repository into public and private sections. The suggested solution is to use two separate repositories: **one public, one private**.
-
-The way you might set these up depends on whether you are allowed to share the real simulation parameter files.
-
-### Scenario 1: Allowed to share real parameters
-
-* **Public repository:** Contains everything needed to reproduce your model except sensitive raw data and input modelling scripts.
-
-* **Private repository:** Contains the sensitive raw data, input modelling scripts, and anything else that cannot be publicly released.
-
-* **Workflow:**
-  1. Do all input modelling (parameter estimation) on real data in your private repository.
-  2. Copy the resulting real parameter files to the public repository.
-  3. Run your model and share code/results publicly - users can fully reproduce your analysis using the real parameters.
-
-### Scenario 2: Only sharing fake/synthetic parameters
-
-* **Public repository:** Contains only synthetic/fake parameter files, synthetic/example data, analysis code, and documentation describing how these synthetic values were generated.
-
-* **Private repository:** Contains the sensitive raw data and real parameter files, plus scripts for analysis with the real values.
-
-* **Shared simulation package (in it's own repository or part of the public repository):** All analysis code is [developed as a package](../setup/package.qmd) that can be installed and used by both the public and private repositories. This greatly reduces code duplication.
-
-* **Workflow:**
-  1. Estimate parameters using real data in your private repository - store these securely.
-  2. Generate synthetic parameter files for the public repository, documenting the generation process.
-  3. Use the shared simulation package in both repositories.
-  4. Run and share the full workflow in public with synthetic parameters; run the actual analysis in private with the real parameters.
-
 ## Test yourself
 
 ```{r}