Understanding Map/Parallel with Invoke #278
Replies: 2 comments
-
|
The more I think on this, it seems like there's just a different pattern at work here with async/await contexts and it might just be how it meant to work. I've mostly just been reading and assuming it to be more akin to a step function unit of work. What it boils down to I think is the configurable ability for a child context to fully complete, before it is released from the concurrency count as opposed to hey this step is waiting on an external API call (with context.wait and loop checking status logic) or function invoke, move on to the next item since I'm freely waiting. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for some great feedback @renre-zreynolds! I think what you're suggesting is more intuitive for the meaning of I've created a task to track changing the implementation to work this way: #279 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there -- I'm really enjoying working with durable functions. It has been great reducing very small / reusable patterns for us that we had to use step functions for.
I had a general question around the documentation surrounding Map/Parallel and Invoking an external lambda function. As I understand it, Step Functions Map/Parallel handle lambda invocations and MaxParallel by the completion of a lambda execution. As they complete, new ones get invoked up to the current MaxParallel. When I first started migrating some step function logic to durable functions, I (incorrectly maybe?) assumed a similar pattern works within context.map and context.parallel when inside the child context, calling context.invoke. It's not technically running and using CPU cycles, but it did send it somewhere that does that.
Conceptually, I see that it's typically around threading and fans out more functions as necessary if the map/parallel uses local functions to the lambda, however, when it comes to context.invoke and calling an external lambda, this seems to become a gray area.
I've noticed that max_concurrency (and to an extent ItemBatcher) does not seem to work in this context. Given the nature of how lambdas are invoked asynchronously and properly awaited internally returning the response, I assumed a similar "wait" pattern existed within a child map iteration that uses context.invoke -- technically nothing is happening so my assumption is it happily moves on to the next item and kicks off the next invoke essentially running every item in the collection as fast as they can be invoked.
The current solution I'm using that mimics this behavior (albeit not as great) is to calculate and pre batch my own array of arrays with the outer array being my "max_concurrency" so at any time only x can run, and internally I just loop each mapped item and run context.invoke with a locally increasing name like "batch_{index}". This seems generally okay as long as my items are evenly distributed with similar workloads.
This seems reasonable to manage outside, but it does seem to be something that can either be enhanced or documented in a way to let readers know there are some differences to think about with concurrency.
In some cases, I need to run different lambdas (language differences, IAM specific permissions like VPC access, etc) so I prefer to isolate my "run" lambda with specialized permissions to not have the durable function take on all of those permissions.
Thanks again!
Beta Was this translation helpful? Give feedback.
All reactions