Replies: 1 comment
-
einops.einsum is merely a facade to AFAIR right now torch.einsum includes opt_einsum and I assume by default optimizes order of execution. If there are any problems in your code with memory allocation, they almost certainly happen in the last einsum. I'd recommend
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am working on implementing SMoE for Mixtral and have found the following bug. When performing
einops.einsumwith multiple tensors at once, the code gives a bug at large batch sizes (w=9*Kand above), giving an error that it attempts to allocate 1008 GiB of memory. This scales linearly, sow=18*Kgives 2016 GiB. However, valuesw=8*Kand below properly execute, even though they "should" be trying to assign equally unreasonable amounts of memory. When I implement the matrix operations separately, the code executes without memory errors, even with very largew=96*Kvalues.Could the multiple tensor memory / arrangement algorithm be improved to solve this error?
Cheers.
Beta Was this translation helpful? Give feedback.
All reactions