Skip to content

refactor: ♻️ join SEFNC weeks resources#90

Open
joelostblom wants to merge 5 commits into
feat/time-suffixesfrom
refactor/sefnc-week-suffix
Open

refactor: ♻️ join SEFNC weeks resources#90
joelostblom wants to merge 5 commits into
feat/time-suffixesfrom
refactor/sefnc-week-suffix

Conversation

@joelostblom
Copy link
Copy Markdown

@joelostblom joelostblom commented May 27, 2026

Merge after #89

Description

This joins the SEFNC visit-specific resources into a single sefnc resource instead of having multiple resources with _vN suffixes.

I opted to repeat some of the vas_ functions here, because it didn't seem like much was gained in generalizing already. See #91 for details.

Closes #46

This PR needs an in-depth review.

Checklist

  • Ran just run-all

Normalize the SEFNC visit-specific REDCap forms into one resource schema with a week field so repeated measurements are represented by rows instead of suffixed columns.
Update the generated data package metadata so SEFNC has one resource with week values 0, 12, and 52 and no visit-suffixed duplicate fields.
@joelostblom joelostblom requested a review from a team as a code owner May 27, 2026 10:23
@joelostblom joelostblom changed the base branch from main to feat/time-suffixes May 27, 2026 10:24
"sefnc_baseline_v4": 0,
"sefnc_week12_v6": 12,
"selfefficacy_for_nutrition_change_sefnc_week_52": 52,
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the week values 0 (baseline), 12, and 52 and removed the suffixes _v6 and _v10. Initially I thought that the suffixes would be the weeks, but the don't line up with the description, so maybe they stand for something like "visit" instead (but maybe it should have been a "b" in Danish...)? Maybe @K-Beicher knows?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also assumed "v" means "visit". Especially because in the data the event for _v6 fields is besg_6_arm_1, for _v10 besg_10_arm_1, etc. (Does this also mean that the week info is already implicit in the event field?)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you're right and it seems that week is technically redundant with the visit number already being encoded in the event value. But maybe still a friendlier way of showing it than extracting it from the event and mapping the visit to the corresponding week?

def _normalise_sefnc_field_name(field_name: str) -> str:
return SEFNC_WEEK_FIELD_PATTERN.sub("", field_name).replace(
"sefnc_ubusy_schedule", "sefnc_busy_schedule"
)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the fields used the name sefnc_ubusy_schedule and some left out the u and called it sefnc_busy_schedule. I don't know if the u is a valuable part to keep or just a type that is fine to remove as I did here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread datapackage.json
"name": "sefnc_when_busy",
"title": "sefnc_when_busy",
"type": "string",
"description": "Participant's confidence in following their diet when they have a heavy workload. Self-reported by participant. Repeated at baseline (V4), week 12 (V6) and week 52 (V10).",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line mentions that the v suffix does not represent the week.

"sefnc_baseline_v4": 0,
"sefnc_week12_v6": 12,
"selfefficacy_for_nutrition_change_sefnc_week_52": 52,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also assumed "v" means "visit". Especially because in the data the event for _v6 fields is besg_6_arm_1, for _v10 besg_10_arm_1, etc. (Does this also mean that the week info is already implicit in the event field?)

Comment thread scripts/redcap_dict_to_properties.py Outdated
) -> list[sp.ResourceProperties]:
"""Converts REDCap data dictionary to Data Package resources."""
redcap_fields = _join_vas_time_resources(redcap_fields)
redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you're already thinking about this in #91, but maybe it would be nicer to have these steps sequential instead of nested. That would allow us to add or remove these transformations in a more modular way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. These are the times I wish Python had a pipe 😢

Suggested change
redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields))
redcap_fields = _join_sefnc_week_resources(redcap_fields)
redcap_fields = _join_vas_time_resources(redcap_fields)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree it's nicer not to nest.

A pipe would be nice too. If we really want something like that, we could consider taking on one of the functional python libraries as a dependency maybe https://toolz.readthedocs.io/en/latest/api.html#toolz.functoolz.pipe (would maybe give more the other functional capabilities as well, but also seems a bit unnecessary if we are likely planning to switch to Rust eventually)?

}


def _normalise_sefnc_field_name(field_name: str) -> str:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also drop the sefnc_ prefix like with vas_?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! ☺️

Copy link
Copy Markdown
Member

@lwjohnst86 lwjohnst86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just some comments/changes :)

Comment thread scripts/redcap_dict_to_properties.py Outdated
) -> list[sp.ResourceProperties]:
"""Converts REDCap data dictionary to Data Package resources."""
redcap_fields = _join_vas_time_resources(redcap_fields)
redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. These are the times I wish Python had a pipe 😢

Suggested change
redcap_fields = _join_sefnc_week_resources(_join_vas_time_resources(redcap_fields))
redcap_fields = _join_sefnc_week_resources(redcap_fields)
redcap_fields = _join_vas_time_resources(redcap_fields)

}


def _normalise_sefnc_field_name(field_name: str) -> str:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! ☺️

def _normalise_sefnc_field_name(field_name: str) -> str:
return SEFNC_WEEK_FIELD_PATTERN.sub("", field_name).replace(
"sefnc_ubusy_schedule", "sefnc_busy_schedule"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

).strip()


def _join_sefnc_week_resources(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to consider a better way of organizing the scripts, this will start getting big...

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I opened #91 for discussing that.

return deduplicated_fields


def _append_if_new_sefnc_field(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be generalised in some way no?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #91 for discussing generalization since I thought it was bigger than just this PR and I wanted to keep the PR small.

if form_name == "sefnc":
week_field = sp.FieldProperties(
name="week",
title="Week",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be week from the first visit?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think so, it seems like the v4 suffix in the column name was just to represent the "baseline", which I interpreted as week 0 and visit 4. See https://github.com/onlimit-study/feasibility-data/pull/90/changes#r3310224431 for what I based this on.

But I'm not certain, maybe @K-Beicher can clarify further?

Comment thread datapackage.json Outdated
"title": "sefnc_when_tired",
"type": "string",
"description": "Participant's confidence in following their diet when feeling tired. Self-reported by participant. Baseline (V4).",
"description": "Participant's confidence in following their diet when feeling tired. Self-reported by participant.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Self-reported by participant" Could probably be removed, as that information is better placed in the resource description.

@github-project-automation github-project-automation Bot moved this from In review to In progress in Data development May 28, 2026
@joelostblom joelostblom requested a review from martonvago May 29, 2026 12:46
@joelostblom joelostblom moved this from In progress to In review in Data development May 29, 2026
@joelostblom joelostblom requested a review from lwjohnst86 May 29, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

sefnc has many resources, but it should only have one

3 participants