Skip to content

Conversation

@JayceFayne
Copy link

This PR adds a recursion depth limit to prevent unbounded recursion when parsing deeply nested messages. I chose the limit of 255 rather arbitrarely. Let me know what you think.

I also wasn't sure how best to expose this to the public api. I was thinking maybe a new function parse_mail_limit?

@staktrace
Copy link
Owner

Have you run into this problem in practice? The recursion would only be unbounded if the input message were infinite bytes long, so I'm not sure this is much more than a theoretical concern

@JayceFayne
Copy link
Author

JayceFayne commented Jan 29, 2026

Yes, actually mb. Should have mention this in the PR description 😄
In practice you can craft a message that overflows the stack in debug mode with ~0.5mb size (~8000 recursive calls). In release it's more like ~3mb (~30000 recursive calls).
I noticed this problem when comparing this library to mail-parser.
They implement a recrusion limit of 3 (which is hard coded)

@staktrace
Copy link
Owner

Can you provide a test case? I'm not opposed to adding an absurdly high limit like 1000 and just hard-coding it in.

@JayceFayne
Copy link
Author

JayceFayne commented Jan 29, 2026

Sure, but the point of stack overflow depends on the machine stack size. I tested this on my machine which has a stack size of 8mb. On windows it's typically much lower (1mb). So running the test on different machines will produce different results.

@JayceFayne JayceFayne force-pushed the recursion branch 3 times, most recently from 324172f to 1efb9f3 Compare January 29, 2026 16:32
@JayceFayne
Copy link
Author

played around a bit, 1000 is too much for github actions. 300 seems to work

@staktrace
Copy link
Owner

Sorry for the delay here. While looking at this PR I discovered that the library is not handling some kinds of multipart boundaries correctly. So now I gotta go fix that first.

@staktrace
Copy link
Owner

Actually the boundary matching is to spec. It's your test file that's wrong. From RFC 2046:

The boundary delimiter MUST NOT appear inside any of the encapsulated parts, on a line by itself or as the prefix of any line

When you have a boundary like boundary="B1" and then later a line like --B10, it has the boundary delimiter --B1 as a prefix, so that violates the spec. mailparse treats --B10 as a boundary delimiter for B1 and the recursion doesn't work the way you think it does.

@staktrace
Copy link
Owner

Please make the following changes:

  • use a recursion limit of 100 unconditionally
  • fix the test so that it doesn't violate the boundary prefixing rule
  • reduce the size of the test so that it triggers around 105 recursion depth, which should be sufficient to test the code.

@JayceFayne
Copy link
Author

Actually the boundary matching is to spec. It's your test file that's wrong. From RFC 2046:

The boundary delimiter MUST NOT appear inside any of the encapsulated parts, on a line by itself or as the prefix of any line

When you have a boundary like boundary="B1" and then later a line like --B10, it has the boundary delimiter --B1 as a prefix, so that violates the spec. mailparse treats --B10 as a boundary delimiter for B1

Ic, well that should be easy enough to fix.

and the recursion doesn't work the way you think it does.

Wdym? 😄 Could you elaborate how you think it works and how it doesnt?

Please make the following changes:

  • use a recursion limit of 100 unconditionally
  • fix the test so that it doesn't violate the boundary prefixing rule
  • reduce the size of the test so that it triggers around 105 recursion depth, which should be sufficient to test the code.

Why the limit of 100?

@staktrace
Copy link
Owner

Could you elaborate how you think it works and how it doesnt?

I just mean that with the test file you attached it wouldn't actually recurse the way you were expecting, because of the boundary prefix matching. If you print out the depth at the start of the recursive function it doesn't decrease linearly all the way. The B10 delimiter is treated as a part separator for the B1 boundary so it only recurses 10 times before moving to another "part". Sorry if that's confusing but you can see what I mean if you reduce the limit to 100, change the file to have 105 nested, and print out the depth at the start of the recursive function.

@JayceFayne
Copy link
Author

JayceFayne commented Jan 30, 2026

Yeah no, ur explanation makes perfect sense. I'll apply the changes tomorrow. Btw. after #136, I might have an idea for rewriting the parser to be iterative rather than recursive. Would you be open to that kind of change, or would you prefer to stick with the current approach and enforce a maximum recursion limit?

@staktrace
Copy link
Owner

Why the limit of 100?

Sorry, forgot to reply to this. I prefer to have a single number for both debug and production code, and honestly any number bigger than 5 or so is probably big enough (as you noted the other mail parser lib sets it to 3). 100 seems like a nice round number.

Would you be open to that kind of change

Depends how readable it ends up being. If you can provide a high-level description of how you'd do it that might help.

@JayceFayne
Copy link
Author

JayceFayne commented Feb 1, 2026

Sorry, forgot to reply to this. I prefer to have a single number for both debug and production code, and honestly any number bigger than 5 or so is probably big enough (as you noted the other mail parser lib sets it to 3). 100 seems like a nice round number.

Okey. I updated the nested.eml if I understood corretly it should be RFC compliant now. Also changed the RECURSION_LIMIT to 100.

Depends how readable it ends up being. If you can provide a high-level description of how you'd do it that might help.

Probably not very readable 😄 At a high level, I’d do another Vec containing partially read message, meaning messages that we know exist but haven’t parsed yet (so basically Vec<(&[u8], bool, parent?)>). You can then use that Vec as a stack, push when you encounter a multipart, and pop messages one by one to parse them.

This essentially turns the parse function into a single while loop that keeps popping from the stack, while also pushing new items onto it inside the loop. The tricky part is pushing the parsed messages back into the correct subpart Vec. We’d need some way to keep track of the “parent” message.

@staktrace
Copy link
Owner

This looks good, thanks!

Probably not very readable 😄 At a high level, I’d do another Vec containing partially read message, meaning messages that we know exist but haven’t parsed yet (so basically Vec<(&[u8], bool, parent?)>). You can then use that Vec as a stack, push when you encounter a multipart, and pop messages one by one to parse them.

Ah so fundamentally instead of using the system stack you're making your own stack using a Vec. I'm not really sure it's worth doing, honestly. This recursion depth is unlikely to be hit in practice except in the case of malicious mails and in those cases bailing out with the recursion depth seems fine.

@staktrace staktrace merged commit e0649b0 into staktrace:master Feb 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants