add recursion limit #134

JayceFayne · 2026-01-28T18:38:34Z

This PR adds a recursion depth limit to prevent unbounded recursion when parsing deeply nested messages. I chose the limit of 255 rather arbitrarely. Let me know what you think.

I also wasn't sure how best to expose this to the public api. I was thinking maybe a new function parse_mail_limit?

staktrace · 2026-01-29T00:44:44Z

Have you run into this problem in practice? The recursion would only be unbounded if the input message were infinite bytes long, so I'm not sure this is much more than a theoretical concern

JayceFayne · 2026-01-29T10:09:42Z

Yes, actually mb. Should have mention this in the PR description 😄
In practice you can craft a message that overflows the stack in debug mode with ~0.5mb size (~8000 recursive calls). In release it's more like ~3mb (~30000 recursive calls).
I noticed this problem when comparing this library to mail-parser.
They implement a recrusion limit of 3 (which is hard coded)

staktrace · 2026-01-29T11:11:31Z

Can you provide a test case? I'm not opposed to adding an absurdly high limit like 1000 and just hard-coding it in.

JayceFayne · 2026-01-29T11:18:29Z

Sure, but the point of stack overflow depends on the machine stack size. I tested this on my machine which has a stack size of 8mb. On windows it's typically much lower (1mb). So running the test on different machines will produce different results.

JayceFayne · 2026-01-29T16:34:54Z

played around a bit, 1000 is too much for github actions. 300 seems to work

staktrace · 2026-01-30T01:52:11Z

Sorry for the delay here. While looking at this PR I discovered that the library is not handling some kinds of multipart boundaries correctly. So now I gotta go fix that first.

staktrace · 2026-01-30T02:08:25Z

Actually the boundary matching is to spec. It's your test file that's wrong. From RFC 2046:

The boundary delimiter MUST NOT appear inside any of the encapsulated parts, on a line by itself or as the prefix of any line

When you have a boundary like boundary="B1" and then later a line like --B10, it has the boundary delimiter --B1 as a prefix, so that violates the spec. mailparse treats --B10 as a boundary delimiter for B1 and the recursion doesn't work the way you think it does.

staktrace · 2026-01-30T02:58:45Z

Please make the following changes:

use a recursion limit of 100 unconditionally
fix the test so that it doesn't violate the boundary prefixing rule
reduce the size of the test so that it triggers around 105 recursion depth, which should be sufficient to test the code.

JayceFayne · 2026-01-30T09:19:25Z

Actually the boundary matching is to spec. It's your test file that's wrong. From RFC 2046:
The boundary delimiter MUST NOT appear inside any of the encapsulated parts, on a line by itself or as the prefix of any line
When you have a boundary like boundary="B1" and then later a line like --B10, it has the boundary delimiter --B1 as a prefix, so that violates the spec. mailparse treats --B10 as a boundary delimiter for B1

Ic, well that should be easy enough to fix.

and the recursion doesn't work the way you think it does.

Wdym? 😄 Could you elaborate how you think it works and how it doesnt?

Please make the following changes:

use a recursion limit of 100 unconditionally

fix the test so that it doesn't violate the boundary prefixing rule

reduce the size of the test so that it triggers around 105 recursion depth, which should be sufficient to test the code.

Why the limit of 100?

staktrace · 2026-01-30T12:01:51Z

Could you elaborate how you think it works and how it doesnt?

I just mean that with the test file you attached it wouldn't actually recurse the way you were expecting, because of the boundary prefix matching. If you print out the depth at the start of the recursive function it doesn't decrease linearly all the way. The B10 delimiter is treated as a part separator for the B1 boundary so it only recurses 10 times before moving to another "part". Sorry if that's confusing but you can see what I mean if you reduce the limit to 100, change the file to have 105 nested, and print out the depth at the start of the recursive function.

JayceFayne · 2026-01-30T18:09:33Z

Yeah no, ur explanation makes perfect sense. I'll apply the changes tomorrow. Btw. after #136, I might have an idea for rewriting the parser to be iterative rather than recursive. Would you be open to that kind of change, or would you prefer to stick with the current approach and enforce a maximum recursion limit?

staktrace · 2026-01-31T01:50:57Z

Why the limit of 100?

Sorry, forgot to reply to this. I prefer to have a single number for both debug and production code, and honestly any number bigger than 5 or so is probably big enough (as you noted the other mail parser lib sets it to 3). 100 seems like a nice round number.

Would you be open to that kind of change

Depends how readable it ends up being. If you can provide a high-level description of how you'd do it that might help.

JayceFayne · 2026-02-01T11:10:21Z

Sorry, forgot to reply to this. I prefer to have a single number for both debug and production code, and honestly any number bigger than 5 or so is probably big enough (as you noted the other mail parser lib sets it to 3). 100 seems like a nice round number.

Okey. I updated the nested.eml if I understood corretly it should be RFC compliant now. Also changed the RECURSION_LIMIT to 100.

Depends how readable it ends up being. If you can provide a high-level description of how you'd do it that might help.

Probably not very readable 😄 At a high level, I’d do another Vec containing partially read message, meaning messages that we know exist but haven’t parsed yet (so basically Vec<(&[u8], bool, parent?)>). You can then use that Vec as a stack, push when you encounter a multipart, and pop messages one by one to parse them.

This essentially turns the parse function into a single while loop that keeps popping from the stack, while also pushing new items onto it inside the loop. The tricky part is pushing the parsed messages back into the correct subpart Vec. We’d need some way to keep track of the “parent” message.

staktrace · 2026-02-02T01:59:11Z

This looks good, thanks!

Probably not very readable 😄 At a high level, I’d do another Vec containing partially read message, meaning messages that we know exist but haven’t parsed yet (so basically Vec<(&[u8], bool, parent?)>). You can then use that Vec as a stack, push when you encounter a multipart, and pop messages one by one to parse them.

Ah so fundamentally instead of using the system stack you're making your own stack using a Vec. I'm not really sure it's worth doing, honestly. This recursion depth is unlikely to be hit in practice except in the case of malicious mails and in those cases bailing out with the recursion depth seems fine.

JayceFayne force-pushed the recursion branch 3 times, most recently from 324172f to 1efb9f3 Compare January 29, 2026 16:32

JayceFayne force-pushed the recursion branch from 0d5452b to 0de4c39 Compare January 29, 2026 16:35

Jayce Fayne added 2 commits February 1, 2026 11:42

add recursion limit

07468b0

add test

f9e52cb

JayceFayne force-pushed the recursion branch from 0de4c39 to f9e52cb Compare February 1, 2026 11:00

staktrace merged commit e0649b0 into staktrace:master Feb 2, 2026
5 checks passed

add recursion limit #134

add recursion limit #134

Uh oh!

Conversation

JayceFayne commented Jan 28, 2026

Uh oh!

staktrace commented Jan 29, 2026

Uh oh!

JayceFayne commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staktrace commented Jan 29, 2026

Uh oh!

JayceFayne commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JayceFayne commented Jan 29, 2026

Uh oh!

staktrace commented Jan 30, 2026

Uh oh!

staktrace commented Jan 30, 2026

Uh oh!

staktrace commented Jan 30, 2026

Uh oh!

JayceFayne commented Jan 30, 2026

Uh oh!

staktrace commented Jan 30, 2026

Uh oh!

JayceFayne commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staktrace commented Jan 31, 2026

Uh oh!

JayceFayne commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staktrace commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JayceFayne commented Jan 29, 2026 •

edited

Loading

JayceFayne commented Jan 29, 2026 •

edited

Loading

JayceFayne commented Jan 30, 2026 •

edited

Loading

JayceFayne commented Feb 1, 2026 •

edited

Loading