Support direct asset materialization through Python function#19988
Support direct asset materialization through Python function#19988Jay-Lokhande wants to merge 19 commits intoPrefectHQ:mainfrom
Conversation
…ws and tasks, in addition to decorator usage.
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
|
…management. Update tests to ensure direct materialization raises appropriate errors outside of execution context.
…e error messaging. Update tests for direct materialization to ensure proper error raising outside of execution contexts.
…arity and maintainability.
…assets across task runs.
|
@Jay-Lokhande are you still working on this PR? We have had some interest in this feature so wanted to follow up to see if this is ready for review or if there is any way we could help! |
|
Hi @robfreedy, thanks for checking in I am still working on this PR the core implementation for direct asset materialization is complete the The failing tests are: all of these tests call materialize() directly within a flow and then check for materialization events but would you be able to help debug this? |
|
hi @Jay-Lokhande! I took a look at the failing tests and think i found the issue which is that i've opened a PR with the fix into your branch: Jay-Lokhande#1 the fix simplifies things a bit:
let me know if that makes sense. feel free to merge it to update your PR here! |
|
@zzstoatzz hi, thank you so much for identifying and fixing the issue |
|
@zzstoatzz would you please give your review on the suggested changes, thanks! |
|
@zzstoatzz Hey! |
|
sorry for the delay @Jay-Lokhande - we'll get this reviewed early next week |
|
@zzstoatzz sure, No worries, and thank you for the reply! |
zzstoatzz
left a comment
There was a problem hiding this comment.
thanks for bearing with us @Jay-Lokhande, looking mostly good, just a couple things
| # Return the callable for decorator use (fallback) | ||
| return materialize_obj |
There was a problem hiding this comment.
this seems like an unreachable case
| api_call_time = (response_time - request_time).total_seconds() | ||
| tolerance = ( | ||
| 2.0 + test_elapsed + api_call_time | ||
| ) # 2s for SQLite precision + test overhead + API call time |
There was a problem hiding this comment.
i think this is a known flake, that unless was causing all of your CI to fail (and not just one flake), i think we should revert for now
There was a problem hiding this comment.
@zzstoatzz Thanks for the feedback! i have addressed both comments.
zzstoatzz
left a comment
There was a problem hiding this comment.
this lgtm, cc @desertaxle for another set of eyes
Review focuses on interface consistency with the existing @materialize decorator, identifying semantic gaps in failure tracking, return type overloading, and test coverage. https://claude.ai/code/session_019zxgVShJHtkZ6wAr5J23fX
desertaxle
left a comment
There was a problem hiding this comment.
I have some concerns with this implementation, one of which I called out on the updated typing of materialize.
Fundamentally, I don't think a function makes sense for direct asset materialization because the operation is so contextual. I think it'd make more sense to have a context manager that lets the materialization logic apply to a code block when an author doesn't want to write an entire function for materialization.
@Jay-Lokhande how does the idea of introducing a context manager instead of a function for direct materialization sit with you?
| ) -> Union[ | ||
| _MaterializeCallable, | ||
| Callable[[Callable[P, R]], MaterializingTask[P, R]], | ||
| None, | ||
| ]: |
There was a problem hiding this comment.
I'm not a fan of how this return type is overloaded as a result of this work, as it'll degrade the typing experience of the @materialize decorator.
I think there should be a separate utility for direct materialization to keep the API clear and explicit here.
There was a problem hiding this comment.
Thanks for the feedback! @desertaxle, you are right on both counts - the context manager makes more sense for this, and keeping materialize() as a pure decorator will preserve the typing experience.
I amm happy to refactor this a few questions to make sure I get it right:
-
Should the context manager materialize on entry or exit? I am guessing exit makes more sense (after the block completes), but wanted to confirm.
-
What should we call the new utility?
materialize_assets()or something else? And shouldmaterialize()stay as just the decorator? -
Should parameters like
bywork the same way?with materialize_assets("s3://bucket/data.csv", by="dbt"): ...
Let me know what you prefer and I will update the PR!
|
@Jay-Lokhande, we've discussed the approach in this PR and the intended Prefect asset materialization design, and we've decided not to accept it. Right now, asset materialization is intentionally tied to the task lifecycle. Certain aspects of asset materialization (such as dependencies) are difficult to reason about outside a task lifecycle, and we need further discussion to identify an approach that aligns well with the intent of asset materialization. I'm going to close this PR, but let's continue discussing approaches on #19634 to see if we can find an approach that aligns with the requester's desires and the feature's intent. |
|
@desertaxle Sorry just to chime in here as it relates to my original issue. is there an avenue you are taking in order to materialize arbitrary assets? For example we are seeking a way to generate downstream assets for a materialized assets where the downstream asset doesn't actually have its own task. Example: We can currently materialize a snowflake table when we update it. How do we materialize 3 Tableau dashboards that utilize that table. We'd like to have a task that takes inputs from a parent @materialize decorated task and then queries our list of related assets and materializes that 3 dashboards within that child task, ideally inheriting the upstream asset as intended by the current asset implementation. If we lock to one task, one asset it makes it impossible to have patterns like this. Also its beneficial in retrofitting some legacy flows so you don't have to go back and change the decorator. |
|
@jbnitorum great question! I'm going to copy your question to the issue and respond there so this discussion doesn't get lost. |
Checklist
This pull request references any related issue by including "closes
#19634"If this pull request adds new functionality, it includes unit tests that cover the changes
If this pull request removes docs files, it includes redirect settings in
mint.json.If this pull request adds functions or classes, it includes helpful docstrings.
Modified
materialize()function to support both decorator and direct call usageWhen called directly in execution context (flow or task), assets are materialized immediately
Task context is automatically inherited when materializing from within a task
Assets materialized directly become proper upstream dependencies for downstream tasks/assets
closes #19634