refactor: ♻️ join SEFNC weeks resources#90
Conversation
Normalize the SEFNC visit-specific REDCap forms into one resource schema with a week field so repeated measurements are represented by rows instead of suffixed columns.
Update the generated data package metadata so SEFNC has one resource with week values 0, 12, and 52 and no visit-suffixed duplicate fields.
| "sefnc_baseline_v4": 0, | ||
| "sefnc_week12_v6": 12, | ||
| "selfefficacy_for_nutrition_change_sefnc_week_52": 52, | ||
| } |
There was a problem hiding this comment.
I went with the week values 0 (baseline), 12, and 52 and removed the suffixes _v6 and _v10. Initially I thought that the suffixes would be the weeks, but the don't line up with the description, so maybe they stand for something like "visit" instead (but maybe it should have been a "b" in Danish...)? Maybe @K-Beicher knows?
There was a problem hiding this comment.
I also assumed "v" means "visit". Especially because in the data the event for _v6 fields is besg_6_arm_1, for _v10 besg_10_arm_1, etc. (Does this also mean that the week info is already implicit in the event field?)
There was a problem hiding this comment.
Yes, I think you're right and it seems that week is technically redundant with the visit number already being encoded in the event value. But maybe still a friendlier way of showing it than extracting it from the event and mapping the visit to the corresponding week?
| def _normalise_sefnc_field_name(field_name: str) -> str: | ||
| return SEFNC_WEEK_FIELD_PATTERN.sub("", field_name).replace( | ||
| "sefnc_ubusy_schedule", "sefnc_busy_schedule" | ||
| ) |
There was a problem hiding this comment.
Some of the fields used the name sefnc_ubusy_schedule and some left out the u and called it sefnc_busy_schedule. I don't know if the u is a valuable part to keep or just a type that is fine to remove as I did here.
| "name": "sefnc_when_busy", | ||
| "title": "sefnc_when_busy", | ||
| "type": "string", | ||
| "description": "Participant's confidence in following their diet when they have a heavy workload. Self-reported by participant. Repeated at baseline (V4), week 12 (V6) and week 52 (V10).", |
There was a problem hiding this comment.
This line mentions that the v suffix does not represent the week.
| "sefnc_baseline_v4": 0, | ||
| "sefnc_week12_v6": 12, | ||
| "selfefficacy_for_nutrition_change_sefnc_week_52": 52, | ||
| } |
There was a problem hiding this comment.
I also assumed "v" means "visit". Especially because in the data the event for _v6 fields is besg_6_arm_1, for _v10 besg_10_arm_1, etc. (Does this also mean that the week info is already implicit in the event field?)
| ) -> list[sp.ResourceProperties]: | ||
| """Converts REDCap data dictionary to Data Package resources.""" | ||
| redcap_fields = _join_vas_time_resources(redcap_fields) | ||
| redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields)) |
There was a problem hiding this comment.
Maybe you're already thinking about this in #91, but maybe it would be nicer to have these steps sequential instead of nested. That would allow us to add or remove these transformations in a more modular way.
There was a problem hiding this comment.
Exactly. These are the times I wish Python had a pipe 😢
| redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields)) | |
| redcap_fields = _join_sefnc_week_resources(redcap_fields) | |
| redcap_fields = _join_vas_time_resources(redcap_fields) |
There was a problem hiding this comment.
Yup, I agree it's nicer not to nest.
A pipe would be nice too. If we really want something like that, we could consider taking on one of the functional python libraries as a dependency maybe https://toolz.readthedocs.io/en/latest/api.html#toolz.functoolz.pipe (would maybe give more the other functional capabilities as well, but also seems a bit unnecessary if we are likely planning to switch to Rust eventually)?
| } | ||
|
|
||
|
|
||
| def _normalise_sefnc_field_name(field_name: str) -> str: |
There was a problem hiding this comment.
Should we also drop the sefnc_ prefix like with vas_?
lwjohnst86
left a comment
There was a problem hiding this comment.
Nice, just some comments/changes :)
| ) -> list[sp.ResourceProperties]: | ||
| """Converts REDCap data dictionary to Data Package resources.""" | ||
| redcap_fields = _join_vas_time_resources(redcap_fields) | ||
| redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields)) |
There was a problem hiding this comment.
Exactly. These are the times I wish Python had a pipe 😢
| redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields)) | |
| redcap_fields = _join_sefnc_week_resources(redcap_fields) | |
| redcap_fields = _join_vas_time_resources(redcap_fields) |
| } | ||
|
|
||
|
|
||
| def _normalise_sefnc_field_name(field_name: str) -> str: |
| def _normalise_sefnc_field_name(field_name: str) -> str: | ||
| return SEFNC_WEEK_FIELD_PATTERN.sub("", field_name).replace( | ||
| "sefnc_ubusy_schedule", "sefnc_busy_schedule" | ||
| ) |
| ).strip() | ||
|
|
||
|
|
||
| def _join_sefnc_week_resources( |
There was a problem hiding this comment.
We may have to consider a better way of organizing the scripts, this will start getting big...
| return deduplicated_fields | ||
|
|
||
|
|
||
| def _append_if_new_sefnc_field( |
There was a problem hiding this comment.
This could probably be generalised in some way no?
There was a problem hiding this comment.
I opened #91 for discussing generalization since I thought it was bigger than just this PR and I wanted to keep the PR small.
| if form_name == "sefnc": | ||
| week_field = sp.FieldProperties( | ||
| name="week", | ||
| title="Week", |
There was a problem hiding this comment.
Would this be week from the first visit?
There was a problem hiding this comment.
Hmm I don't think so, it seems like the v4 suffix in the column name was just to represent the "baseline", which I interpreted as week 0 and visit 4. See https://github.com/onlimit-study/feasibility-data/pull/90/changes#r3310224431 for what I based this on.
But I'm not certain, maybe @K-Beicher can clarify further?
| "title": "sefnc_when_tired", | ||
| "type": "string", | ||
| "description": "Participant's confidence in following their diet when feeling tired. Self-reported by participant. Baseline (V4).", | ||
| "description": "Participant's confidence in following their diet when feeling tired. Self-reported by participant.", |
There was a problem hiding this comment.
"Self-reported by participant" Could probably be removed, as that information is better placed in the resource description.
Merge after #89
Description
This joins the SEFNC visit-specific resources into a single
sefncresource instead of having multiple resources with_vNsuffixes.I opted to repeat some of the
vas_functions here, because it didn't seem like much was gained in generalizing already. See #91 for details.Closes #46
This PR needs an in-depth review.
Checklist
just run-all