Big-Life-Lab · DougManuel · May 21, 2026 · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: MockData
 Title: Generate Mock Data from Metadata Specifications
-Version: 0.3.0
+Version: 0.4.0
 Authors@R: c(
     person("Juan", "Li", role = "aut", email = "juli@ohri.ca"),
     person("Douglas", "Manuel", role = c("aut", "cre"), email = "dmanuel@ohri.ca"),

diff --git a/README.md b/README.md
@@ -3,19 +3,21 @@
 <!-- badges: start -->
 
 [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
-[![Version: 0.3.0](https://img.shields.io/badge/version-0.3.0-blue.svg)](https://github.com/Big-Life-Lab/MockData)
+[![Version: 0.4.0](https://img.shields.io/badge/version-0.4.0-blue.svg)](https://github.com/Big-Life-Lab/MockData)
 [![pkgdown](https://github.com/Big-Life-Lab/MockData/actions/workflows/pkgdown.yaml/badge.svg)](https://github.com/Big-Life-Lab/MockData/actions/workflows/pkgdown.yaml)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 
 <!-- badges: end -->
 
-**Status: Experimental, pre-release software**
+**Status: Experimental v0.4.0 release candidate**
 
 MockData is a work-in-progress R package for generating mock testing data from
-small metadata specifications. It is useful
-today for development and documentation workflows, especially when paired
-with recodeflow-style metadata (see below), but it should be treated as experimental
-infrastructure rather than a stable released package.
+small metadata specifications. The `dev` branch now contains the v0.4
+`mock_spec` architecture: direct specification helpers, a recodeflow metadata
+adapter, native generation, optional `simstudy` generation, and post-processing
+diagnostics. It is useful today for development and documentation workflows,
+especially when paired with recodeflow-style metadata (see below), but it should
+be treated as experimental infrastructure rather than a stable released package.
 
 People are using MockData and reporting that it is helpful. We take that as an
 encouraging signal, not as evidence that the package is mature. Please review
@@ -33,10 +35,45 @@ the generated data before using it in any workflow that matters.
 **Current development limitations:**
 
 - APIs may change before a formal release
-- Error handling is too permissive and can fail with warnings instead of stopping
+- Some legacy v0.3-compatible paths still fall back with warnings; the v0.4
+  `mock_spec` path is stricter and records diagnostics
 - The test suite does not yet cover every important edge case
 - Generated data should be manually checked against your intended metadata rules
 
+**v0.4 direct API example**
+
+The v0.4 API separates specification, baseline generation, and post-processing.
+That makes the generated values easier to inspect and audit.
+
+```r
+library(MockData)
+
+spec <- mock_spec(
+  mock_spec_continuous(
+    "age",
+    range = c(18, 85),
+    distribution = "normal",
+    mean = 50,
+    sd = 12,
+    rtype = "integer"
+  ),
+  mock_spec_categorical(
+    "smoking",
+    levels = c("never", "former", "current"),
+    proportions = c(0.5, 0.3, 0.2),
+    rtype = "character",
+    missing_codes = "unknown",
+    missing_proportions = 0.05
+  )
+)
+
+baseline <- generate_mock_data_native(spec, n = 100, seed = 1)
+mock_data <- postprocess_mock_data(baseline, spec, seed = 2)
+
+head(mock_data)
+attr(mock_data, "mockdata_diagnostics")$variables$smoking
+```
+
 **30-second standalone example**
 
 For a quick numeric variable, `create_con_var()` can use two small
@@ -221,6 +258,7 @@ devtools::install_local("~/github/mock-data")
 
 **Tutorials:**
 
+- [v0.4 getting started](vignettes/getting-started-v04.qmd) - Direct `mock_spec`, recodeflow adapter, and diagnostics workflow
 - [Getting started](vignettes/getting-started.qmd) - Complete tutorial from single variables to full datasets
 - [For recodeflow users](vignettes/for-recodeflow-users.qmd) - Using MockData with existing metadata
 - [Survival data](vignettes/tutorial-survival-data.qmd) - Time-to-event data and temporal patterns

diff --git a/_pkgdown.yml b/_pkgdown.yml
@@ -100,6 +100,7 @@ articles:
   desc: Learning-oriented step-by-step guides
   navbar: Tutorials
   contents:
+  - getting-started-v04
   - getting-started
   - tutorial-categorical-continuous
   - tutorial-dates
@@ -111,12 +112,17 @@ articles:
   desc: Task-oriented practical examples
   navbar: How-to guides
   contents:
+  - recodeflow-metadata-v04
+  - diagnostics-and-garbage-v04
+  - migrating-from-v03-v04
+  - choosing-a-backend-v04
   - for-recodeflow-users
 
 - title: Explanation
   desc: Understanding concepts and design decisions
   navbar: Explanation
   contents:
+  - design-philosophy-v04
   - advanced-topics
 
 - title: Reference

diff --git a/development/adr/v04-hybrid-backend.md b/development/adr/v04-hybrid-backend.md
@@ -1,7 +1,7 @@
 # ADR: v0.4 Hybrid Backend Architecture
 
-**Status**: draft  
-**Date**: 2026-05-18  
+**Status**: accepted and implemented in PR #28
+**Date**: 2026-05-18
 **Decision owner**: MockData maintainers
 
 ## Context
@@ -23,6 +23,9 @@ MockData-specific semantics as post-processing. Three review rounds converged on
 the same conclusion: the hybrid architecture is ready for production refactor
 planning.
 
+The production refactor was implemented in PR #28 and merged to `dev` for
+sibling-package testing before a v0.4.0 tag.
+
 ## Decision
 
 MockData v0.4 will move toward a hybrid backend architecture:
@@ -92,18 +95,22 @@ Tradeoffs:
   formula syntax, custom distribution registry, and correlation merging.
 - Maintaining wrappers will add short-term complexity.
 
-## Implementation Direction
+## Implementation Status
 
-Production refactor should proceed in layers:
+The production refactor proceeded in layers:
 
 1. `mock_spec` constructors and validators.
 2. Direct and recodeflow input adapters.
-3. Formula/dependency evaluator.
-4. Native backend.
-5. Post-processing layer.
-6. Promotion of spike assertions to `testthat`.
-7. Optional `simstudy` backend.
-8. Current API wrappers.
+3. Native backend.
+4. Post-processing layer and diagnostics.
+5. Promotion of spike assertions to `testthat`.
+6. Optional `simstudy` backend.
+7. Current API wrappers.
+8. Divio documentation sprint and Phase C maintainer communication.
+
+Formula/dependency evaluation, multi-group correlations, Table 1 adapters, and
+schema-first integration remain deferred roadmap items rather than v0.4.0
+commitments.
 
 ## Open Follow-Up Decisions
 

diff --git a/development/simstudy-v04.md b/development/simstudy-v04.md
@@ -1,9 +1,15 @@
 # MockData v0.4 Production Refactor Plan
 
+**Status**: implemented in PR #28 and superseded by the v0.4 documentation
+sprint. This document is retained as the production-refactor plan and should be
+read as historical implementation context rather than an active task list.
+
 ## 1. Write The ADR First
 
 Write a short architecture decision record before production code changes.
 
+**Status**: complete. See `development/adr/v04-hybrid-backend.md`.
+
 The ADR should lock these decisions:
 
 - **Decision**: MockData adopts a hybrid backend architecture.
@@ -31,6 +37,9 @@ The ADR should lock these decisions:
 
 Each layer should have focused tests before the next layer starts.
 
+**Status**: complete for the v0.4.0 scope. Formula/dependency evaluation,
+multi-group correlation, and Table 1 input remain deferred roadmap items.
+
 1. **`mock_spec` core**
    - Constructors and validators.
    - Stable fields for names, types, ranges, levels, proportions, missing codes,
@@ -81,6 +90,10 @@ Each layer should have focused tests before the next layer starts.
 
 ## 3. Keep The Current API Alive
 
+**Status**: complete. The v0.3 public functions remain available, and
+`create_mock_data()` now routes supported metadata through the v0.4 pipeline
+while preserving legacy fallback paths.
+
 Existing public functions should remain available in v0.4.0:
 
 - `create_mock_data()`
@@ -95,6 +108,11 @@ synchronized release.
 
 ## 4. Carry-Forward Design Issues
 
+**Status**: partly resolved. The diagnostics shape, seed discipline, native vs
+`simstudy` parity tests, and optional `simstudy` posture were settled for v0.4.0.
+The remaining items below should be treated as v0.5+ roadmap candidates or issue
+backlog material.
+
 Settle in the ADR or the first design note:
 
 - Multi-group correlation merge strategy.
@@ -116,6 +134,9 @@ Track as implementation issues:
 
 ## 5. Communication
 
+**Status**: complete as a draft communication artifact. See
+`development/v04-phase-c-comms-note.md`.
+
 Before v0.4.0 lands, write a short communication note for cchsflow, chmsflow,
 and recodeflow maintainers:
 
@@ -125,4 +146,3 @@ and recodeflow maintainers:
 - What migration is optional in v0.4.0.
 - When deprecation warnings may begin.
 - How the mock-data framing remains distinct from synthetic-data release.
-
diff --git a/development/v04-documentation-sprint.md b/development/v04-documentation-sprint.md
@@ -0,0 +1,76 @@
+# MockData v0.4 Documentation Sprint
+
+**Status**: complete for the v0.4.0 documentation sprint. Remaining work before
+tagging is package checks, maintainer smoke testing, and any follow-up edits from
+review.
+
+This sprint treats documentation as implementation validation. The goal is not
+only to explain the v0.4 API, but to run realistic user workflows during
+vignette and pkgdown builds.
+
+## Principles
+
+- Use Divio's four documentation needs: tutorials, how-to guides, reference, and
+  explanation.
+- Keep vignette code executable unless the code genuinely depends on an
+  external package or private data.
+- Prefer small, focused vignettes over one large tour.
+- Use seeds in every stochastic example so rendered output is stable.
+- Include at least one diagnostics example because the v0.4 pipeline's
+  auditability contract is a central design change.
+
+## First pass
+
+- `getting-started-v04.qmd`: tutorial for the v0.4 `mock_spec` workflow.
+- `recodeflow-metadata-v04.qmd`: how-to for generating mock data from
+  recodeflow-style CSV metadata.
+- `diagnostics-and-garbage-v04.qmd`: how-to for reading diagnostics and
+  auditing garbage/missing-code post-processing.
+- `migrating-from-v03-v04.qmd`: how-to for compatibility behavior, fallback
+  routing, diagnostics, and seed differences.
+- `choosing-a-backend-v04.qmd`: how-to for native versus optional `simstudy`
+  backend selection.
+- `design-philosophy-v04.qmd`: explanation of v0.4 design choices and scope
+  boundaries.
+- README: update the top-level status and quick example so users see v0.4
+  immediately.
+- `_pkgdown.yml`: expose the new v0.4 tutorial in site navigation.
+- `v04-phase-c-comms-note.md`: maintainer-facing note for cchsflow,
+  chmsflow, and recodeflow testing while v0.4 sits on `dev`.
+
+## Follow-up vignettes
+
+Tutorial:
+
+- `getting-started-v04.qmd`: linear first-use path.
+
+How-to:
+
+- `recodeflow-metadata-v04.qmd`: use existing `variables.csv` and
+  `variable_details.csv`.
+- `diagnostics-and-garbage-v04.qmd`: inspect missing-code and garbage
+  diagnostics.
+- `choosing-a-backend-v04.qmd`: native vs optional `simstudy`.
+- `migrating-from-v03-v04.qmd`: seed behavior, diagnostics attribute,
+  fallback conditions, and compatibility wrappers.
+
+Explanation:
+
+- `design-philosophy-v04.qmd`: distill the architecture review, hybrid backend
+  decision, and mock-data versus synthetic-data boundary.
+- `development/v04-phase-c-comms-note.md`: Phase C maintainer communication
+  source material; fold relevant parts into migration and recodeflow how-to
+  docs after maintainer feedback.
+
+Reference:
+
+- Keep roxygen pages and `_pkgdown.yml` synchronized with exported functions.
+- Keep `NEWS.md` as the release-note source of truth.
+
+## Review checklist
+
+- Does every vignette render locally?
+- Does every code chunk either run or clearly justify `eval: false`?
+- Does each vignette commit to one Divio purpose?
+- Do examples use the public API exactly as users should use it?
+- Are error messages and diagnostics understandable in rendered output?