Skip to content

feat: ✨ convert redcap data dict to resource properties#35

Open
martonvago wants to merge 12 commits intomainfrom
feat/convert-to-resources
Open

feat: ✨ convert redcap data dict to resource properties#35
martonvago wants to merge 12 commits intomainfrom
feat/convert-to-resources

Conversation

@martonvago
Copy link
Copy Markdown
Collaborator

@martonvago martonvago commented Mar 20, 2026

Description

This PR adds the ability to convert a saved REDCap data dict to resource properties.
Followed this plan: #23 (comment)
Does not include a resource for events/visits. Not sure if we want that?

Closes #25 closes #26

This PR needs an in-depth review.

Checklist

  • Formatted Markdown
  • Ran just run-all

Comment on lines +14 to +24
def _map(x: Iterable[In], fn: Callable[[In], Out]) -> list[Out]:
return list(map(fn, x))


def _filter(x: Iterable[In], fn: Callable[[In], bool]) -> list[In]:
return list(filter(fn, x))


def _flat_map(items: Iterable[In], fn: Callable[[In], Iterable[Out]]) -> list[Out]:
"""Maps and flattens the items by one level."""
return list(chain.from_iterable(map(fn, items)))
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could come from soil.
Or we could decide #34 and put them in internals.py

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, agreed, we use them in almost all our packages/code, so makes sense to move them into their own package 👍

form_name: str, fields: list[dict[str, str]]
) -> sp.ResourceProperties:
visit_field = sp.FieldProperties(
name="visit",
Copy link
Copy Markdown
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or event?

Could contain the unique_event_name or event_id of a REDCap event

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't have a strong opinion on this. I think visit is fine, @K-Beicher thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all events will be visits, and event is a well established REDCap concept, that would lean towards using either event as the word (if I understand your discussion correctly).

Comment on lines +85 to +86
title=form_name,
description=form_name,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title could be the instrument_label that we can get by making another API call (it's not in the data dict).

I wasn't able to find an equivalent for description.

title=field["field_name"],
type=_get_type(field),
description=_get_description(field),
categories=_get_categories(field),
Copy link
Copy Markdown
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a list of strings.
We can also have a list of objects like {"value": 1, "label": "apple"} to keep track of the REDCap number of the choices as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we might have to organise the scripts for both the properties and for tidying up the data.

@martonvago martonvago moved this from Todo to In Review in Data development Mar 20, 2026
@martonvago martonvago marked this pull request as ready for review March 20, 2026 11:56
Copy link
Copy Markdown
Member

@lwjohnst86 lwjohnst86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start! It would be better to actually see what the properties output is though.

Comment on lines +14 to +24
def _map(x: Iterable[In], fn: Callable[[In], Out]) -> list[Out]:
return list(map(fn, x))


def _filter(x: Iterable[In], fn: Callable[[In], bool]) -> list[In]:
return list(filter(fn, x))


def _flat_map(items: Iterable[In], fn: Callable[[In], Iterable[Out]]) -> list[Out]:
"""Maps and flattens the items by one level."""
return list(chain.from_iterable(map(fn, items)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, agreed, we use them in almost all our packages/code, so makes sense to move them into their own package 👍

form_name: str, fields: list[dict[str, str]]
) -> sp.ResourceProperties:
visit_field = sp.FieldProperties(
name="visit",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't have a strong opinion on this. I think visit is fine, @K-Beicher thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help to see the output of this with the actual metadata converted to properties. Can you add that here too?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the file name could be shortened to something like redcap_dict_to_properties.py. As we get more files, we'll have to think what the best naming scheme is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this code could potentially be moved into a "shears" package or other toolkit type name, since the REDCap dictionary to properties conversion could potentially be a very common thing to do.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for sure!

return list(chain.from_iterable(map(fn, items)))


def load_data_dict_from_file() -> list[dict[str, str]]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be renamed to read_dictionary()

return json.load(f)


def redcap_data_dict_to_resource_properties(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def redcap_data_dict_to_resource_properties(
def dictionary_to_properties(

Just to simplify the naming. It's in the redcap script, so don't really need to use redcap in the name.

)


def _redcap_form_to_resource(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about redcap

title=field["field_name"],
type=_get_type(field),
description=_get_description(field),
categories=_get_categories(field),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we might have to organise the scripts for both the properties and for tidying up the data.

@github-project-automation github-project-automation bot moved this from In Review to In Progress in Data development Mar 25, 2026
constraints=sp.ConstraintsProperties(required=True),
)
center_field = sp.FieldProperties(
name="center",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or centre 🇬🇧 :P

Comment on lines +87 to +88
min_length=_get_text_length_bound(field, "text_validation_min"),
max_length=_get_text_length_bound(field, "text_validation_max"),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually seen these in the wild

@martonvago
Copy link
Copy Markdown
Collaborator Author

I added the generated resource properties to the package properties and updated main.py. Then I ran main.py on GenomeDK and committed the generated datapackage.json.

@martonvago martonvago moved this from In Progress to In Review in Data development Mar 25, 2026
@martonvago martonvago requested a review from lwjohnst86 March 25, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Convert data dictionary items within each resource into properties Reproducibly split data dictionary into resources

3 participants