Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: MockData
Title: Generate Mock Data from Metadata Specifications
Version: 0.3.0
Version: 0.4.0
Authors@R: c(
person("Juan", "Li", role = "aut", email = "juli@ohri.ca"),
person("Douglas", "Manuel", role = c("aut", "cre"), email = "dmanuel@ohri.ca"),
Expand Down
52 changes: 45 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,21 @@
<!-- badges: start -->

[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![Version: 0.3.0](https://img.shields.io/badge/version-0.3.0-blue.svg)](https://github.com/Big-Life-Lab/MockData)
[![Version: 0.4.0](https://img.shields.io/badge/version-0.4.0-blue.svg)](https://github.com/Big-Life-Lab/MockData)
[![pkgdown](https://github.com/Big-Life-Lab/MockData/actions/workflows/pkgdown.yaml/badge.svg)](https://github.com/Big-Life-Lab/MockData/actions/workflows/pkgdown.yaml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

<!-- badges: end -->

**Status: Experimental, pre-release software**
**Status: Experimental v0.4.0 release candidate**

MockData is a work-in-progress R package for generating mock testing data from
small metadata specifications. It is useful
today for development and documentation workflows, especially when paired
with recodeflow-style metadata (see below), but it should be treated as experimental
infrastructure rather than a stable released package.
small metadata specifications. The `dev` branch now contains the v0.4
`mock_spec` architecture: direct specification helpers, a recodeflow metadata
adapter, native generation, optional `simstudy` generation, and post-processing
diagnostics. It is useful today for development and documentation workflows,
especially when paired with recodeflow-style metadata (see below), but it should
be treated as experimental infrastructure rather than a stable released package.

People are using MockData and reporting that it is helpful. We take that as an
encouraging signal, not as evidence that the package is mature. Please review
Expand All @@ -33,10 +35,45 @@ the generated data before using it in any workflow that matters.
**Current development limitations:**

- APIs may change before a formal release
- Error handling is too permissive and can fail with warnings instead of stopping
- Some legacy v0.3-compatible paths still fall back with warnings; the v0.4
`mock_spec` path is stricter and records diagnostics
- The test suite does not yet cover every important edge case
- Generated data should be manually checked against your intended metadata rules

**v0.4 direct API example**

The v0.4 API separates specification, baseline generation, and post-processing.
That makes the generated values easier to inspect and audit.

```r
library(MockData)

spec <- mock_spec(
mock_spec_continuous(
"age",
range = c(18, 85),
distribution = "normal",
mean = 50,
sd = 12,
rtype = "integer"
),
mock_spec_categorical(
"smoking",
levels = c("never", "former", "current"),
proportions = c(0.5, 0.3, 0.2),
rtype = "character",
missing_codes = "unknown",
missing_proportions = 0.05
)
)

baseline <- generate_mock_data_native(spec, n = 100, seed = 1)
mock_data <- postprocess_mock_data(baseline, spec, seed = 2)

head(mock_data)
attr(mock_data, "mockdata_diagnostics")$variables$smoking
```

**30-second standalone example**

For a quick numeric variable, `create_con_var()` can use two small
Expand Down Expand Up @@ -221,6 +258,7 @@ devtools::install_local("~/github/mock-data")

**Tutorials:**

- [v0.4 getting started](vignettes/getting-started-v04.qmd) - Direct `mock_spec`, recodeflow adapter, and diagnostics workflow
- [Getting started](vignettes/getting-started.qmd) - Complete tutorial from single variables to full datasets
- [For recodeflow users](vignettes/for-recodeflow-users.qmd) - Using MockData with existing metadata
- [Survival data](vignettes/tutorial-survival-data.qmd) - Time-to-event data and temporal patterns
Expand Down
6 changes: 6 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ articles:
desc: Learning-oriented step-by-step guides
navbar: Tutorials
contents:
- getting-started-v04
- getting-started
- tutorial-categorical-continuous
- tutorial-dates
Expand All @@ -111,12 +112,17 @@ articles:
desc: Task-oriented practical examples
navbar: How-to guides
contents:
- recodeflow-metadata-v04
- diagnostics-and-garbage-v04
- migrating-from-v03-v04
- choosing-a-backend-v04
- for-recodeflow-users

- title: Explanation
desc: Understanding concepts and design decisions
navbar: Explanation
contents:
- design-philosophy-v04
- advanced-topics

- title: Reference
Expand Down
27 changes: 17 additions & 10 deletions development/adr/v04-hybrid-backend.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# ADR: v0.4 Hybrid Backend Architecture

**Status**: draft
**Date**: 2026-05-18
**Status**: accepted and implemented in PR #28
**Date**: 2026-05-18
**Decision owner**: MockData maintainers

## Context
Expand All @@ -23,6 +23,9 @@ MockData-specific semantics as post-processing. Three review rounds converged on
the same conclusion: the hybrid architecture is ready for production refactor
planning.

The production refactor was implemented in PR #28 and merged to `dev` for
sibling-package testing before a v0.4.0 tag.

## Decision

MockData v0.4 will move toward a hybrid backend architecture:
Expand Down Expand Up @@ -92,18 +95,22 @@ Tradeoffs:
formula syntax, custom distribution registry, and correlation merging.
- Maintaining wrappers will add short-term complexity.

## Implementation Direction
## Implementation Status

Production refactor should proceed in layers:
The production refactor proceeded in layers:

1. `mock_spec` constructors and validators.
2. Direct and recodeflow input adapters.
3. Formula/dependency evaluator.
4. Native backend.
5. Post-processing layer.
6. Promotion of spike assertions to `testthat`.
7. Optional `simstudy` backend.
8. Current API wrappers.
3. Native backend.
4. Post-processing layer and diagnostics.
5. Promotion of spike assertions to `testthat`.
6. Optional `simstudy` backend.
7. Current API wrappers.
8. Divio documentation sprint and Phase C maintainer communication.

Formula/dependency evaluation, multi-group correlations, Table 1 adapters, and
schema-first integration remain deferred roadmap items rather than v0.4.0
commitments.

## Open Follow-Up Decisions

Expand Down
22 changes: 21 additions & 1 deletion development/simstudy-v04.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# MockData v0.4 Production Refactor Plan

**Status**: implemented in PR #28 and superseded by the v0.4 documentation
sprint. This document is retained as the production-refactor plan and should be
read as historical implementation context rather than an active task list.

## 1. Write The ADR First

Write a short architecture decision record before production code changes.

**Status**: complete. See `development/adr/v04-hybrid-backend.md`.

The ADR should lock these decisions:

- **Decision**: MockData adopts a hybrid backend architecture.
Expand Down Expand Up @@ -31,6 +37,9 @@ The ADR should lock these decisions:

Each layer should have focused tests before the next layer starts.

**Status**: complete for the v0.4.0 scope. Formula/dependency evaluation,
multi-group correlation, and Table 1 input remain deferred roadmap items.

1. **`mock_spec` core**
- Constructors and validators.
- Stable fields for names, types, ranges, levels, proportions, missing codes,
Expand Down Expand Up @@ -81,6 +90,10 @@ Each layer should have focused tests before the next layer starts.

## 3. Keep The Current API Alive

**Status**: complete. The v0.3 public functions remain available, and
`create_mock_data()` now routes supported metadata through the v0.4 pipeline
while preserving legacy fallback paths.

Existing public functions should remain available in v0.4.0:

- `create_mock_data()`
Expand All @@ -95,6 +108,11 @@ synchronized release.

## 4. Carry-Forward Design Issues

**Status**: partly resolved. The diagnostics shape, seed discipline, native vs
`simstudy` parity tests, and optional `simstudy` posture were settled for v0.4.0.
The remaining items below should be treated as v0.5+ roadmap candidates or issue
backlog material.

Settle in the ADR or the first design note:

- Multi-group correlation merge strategy.
Expand All @@ -116,6 +134,9 @@ Track as implementation issues:

## 5. Communication

**Status**: complete as a draft communication artifact. See
`development/v04-phase-c-comms-note.md`.

Before v0.4.0 lands, write a short communication note for cchsflow, chmsflow,
and recodeflow maintainers:

Expand All @@ -125,4 +146,3 @@ and recodeflow maintainers:
- What migration is optional in v0.4.0.
- When deprecation warnings may begin.
- How the mock-data framing remains distinct from synthetic-data release.

76 changes: 76 additions & 0 deletions development/v04-documentation-sprint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# MockData v0.4 Documentation Sprint

**Status**: complete for the v0.4.0 documentation sprint. Remaining work before
tagging is package checks, maintainer smoke testing, and any follow-up edits from
review.

This sprint treats documentation as implementation validation. The goal is not
only to explain the v0.4 API, but to run realistic user workflows during
vignette and pkgdown builds.

## Principles

- Use Divio's four documentation needs: tutorials, how-to guides, reference, and
explanation.
- Keep vignette code executable unless the code genuinely depends on an
external package or private data.
- Prefer small, focused vignettes over one large tour.
- Use seeds in every stochastic example so rendered output is stable.
- Include at least one diagnostics example because the v0.4 pipeline's
auditability contract is a central design change.

## First pass

- `getting-started-v04.qmd`: tutorial for the v0.4 `mock_spec` workflow.
- `recodeflow-metadata-v04.qmd`: how-to for generating mock data from
recodeflow-style CSV metadata.
- `diagnostics-and-garbage-v04.qmd`: how-to for reading diagnostics and
auditing garbage/missing-code post-processing.
- `migrating-from-v03-v04.qmd`: how-to for compatibility behavior, fallback
routing, diagnostics, and seed differences.
- `choosing-a-backend-v04.qmd`: how-to for native versus optional `simstudy`
backend selection.
- `design-philosophy-v04.qmd`: explanation of v0.4 design choices and scope
boundaries.
- README: update the top-level status and quick example so users see v0.4
immediately.
- `_pkgdown.yml`: expose the new v0.4 tutorial in site navigation.
- `v04-phase-c-comms-note.md`: maintainer-facing note for cchsflow,
chmsflow, and recodeflow testing while v0.4 sits on `dev`.

## Follow-up vignettes

Tutorial:

- `getting-started-v04.qmd`: linear first-use path.

How-to:

- `recodeflow-metadata-v04.qmd`: use existing `variables.csv` and
`variable_details.csv`.
- `diagnostics-and-garbage-v04.qmd`: inspect missing-code and garbage
diagnostics.
- `choosing-a-backend-v04.qmd`: native vs optional `simstudy`.
- `migrating-from-v03-v04.qmd`: seed behavior, diagnostics attribute,
fallback conditions, and compatibility wrappers.

Explanation:

- `design-philosophy-v04.qmd`: distill the architecture review, hybrid backend
decision, and mock-data versus synthetic-data boundary.
- `development/v04-phase-c-comms-note.md`: Phase C maintainer communication
source material; fold relevant parts into migration and recodeflow how-to
docs after maintainer feedback.

Reference:

- Keep roxygen pages and `_pkgdown.yml` synchronized with exported functions.
- Keep `NEWS.md` as the release-note source of truth.

## Review checklist

- Does every vignette render locally?
- Does every code chunk either run or clearly justify `eval: false`?
- Does each vignette commit to one Divio purpose?
- Do examples use the public API exactly as users should use it?
- Are error messages and diagnostics understandable in rendered output?
Loading
Loading