-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Users often have difficulty working with large state objects returned by their jobs. It's fairly common to write large objects to state which are used throughout the workflow, but aren't really relevant as outputs from any given step.
For example: a workflow downloads 10kb of mapping data from collections in step 1, and re-uses those mappings in step 3, 5 and 9. State is the only way to share that information. But when viewing the output dataclips from different steps, the user doesn't want to see the mappings on state.
One solution to this is to have private state keys. A private state key is preserved by the runtime but not returned as output from steps - so it's essentially invisible to lightning, the worker and the CLI.
In the runtime, we need to work out a way to strip the private keys when we emit and return step state - but send the unmodified state object into the next step. We need to differentiate internal state from external state.
We could try something fancy like proxy properties - or just iterate over top-level keys while cloning and remove them. I'd prefer something simple.
There are a couple of ways we could handle private state keys:
- Any key starting with an underctore would be treated private. So if you do
state._mappings = {}, that will be redacted from the dataclip but sent downstream internally. - Have a special state key called
privateor something. Sostate.private.mappings = {}. Tidy but a little lumpy at the same time - Use special compiler syntax, like
state.#mappings = {}, which sort of reflects private class properties. Would rather keep the compiler out of it tbh - Use an adaptor function like
markPrivate(key)to flag a key as private. This might do something complicated with proxies.
We should inform the user somehow that private keys are being hidden. We could debug log at the end of each step, or maybe just return { _mappings: "[private]" } . Not sure yet - but we should make it clear to users that a key is on state but being hidden from them.
You could also use these internal/private state keys for sensitive data. And adaptors could use private keys to track state but hide it from the user (we've largely killed off that pattern but this would give us the opportunity to restore it, should the need arise).
This probably isn't suitable for configuration as that should never been sent to the next step. References would be a good candidate though.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status