Skip to content

node: bounds-check and OOM-guard in cmark_set_cstr#609

Open
Alearner12 wants to merge 1 commit into
commonmark:masterfrom
Alearner12:fix/cmark-set-cstr-bounds-oom
Open

node: bounds-check and OOM-guard in cmark_set_cstr#609
Alearner12 wants to merge 1 commit into
commonmark:masterfrom
Alearner12:fix/cmark-set-cstr-bounds-oom

Conversation

@Alearner12
Copy link
Copy Markdown

@Alearner12 Alearner12 commented May 17, 2026

node: bounds-check in cmark_set_cstr and propagate error status

cmark_set_cstr in src/node.c backs every cmark string setter — cmark_node_set_url, _set_title, _set_literal, _set_fence_info, _set_on_enter, _set_on_exit. It has two paths to undefined behaviour that I think are worth closing.

1. The realloc result is not checked.

*dst = (unsigned char *)mem->realloc(NULL, len + 1);
memcpy(*dst, src, len + 1);

mem->realloc can return NULL — either stdlib OOM or, more importantly, a caller-supplied allocator (cmark exposes the allocator through cmark_node_new_with_mem). The next line dereferences it, so failure is UB inside memcpy rather than a defined abort. Other unreasonable-state paths in cmark — for example the target_size > INT32_MAX/2 branch in cmark_strbuf_grow (src/buffer.c:43-48) — already abort() with a diagnostic. This change keeps that pattern.

2. The bufsize_t cast of strlen silently truncates.

bufsize_t is int32_t (src/buffer.h:16). For strlen(src) > INT32_MAX the cast wraps; subsequent len + 1 and memcpy(*dst, src, len + 1) operate on a truncated (or, for some wraparounds, negative) count. The function then returns this incorrect length to the caller, which uses it as a size elsewhere. Same INT32_MAX bound and abort treatment as Defect 1.

Verification

Built against current master (b320f40) with -fsanitize=address,undefined. A small reproducer using cmark_node_new_with_mem to inject a failing realloc:

static void *p_calloc(size_t n, size_t sz) { return calloc(n, sz); }
static void *f_realloc(void *p, size_t sz) { (void)p; (void)sz; return NULL; }
static void  p_free(void *p)               { free(p); }

cmark_mem mem = { p_calloc, f_realloc, p_free };
cmark_node *link = cmark_node_new_with_mem(CMARK_NODE_LINK, &mem);
cmark_node_set_url(link, "https://example.com/");

Before:

node.c:275:7: runtime error: null pointer passed as argument 1, which is declared to never be null
==97168==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000
    #0 memcpy (libc.so.6)
    #1 cmark_set_cstr  src/node.c:275
    #2 cmark_node_set_url  src/node.c:534

After this patch:

[cmark] cmark_set_cstr: allocator returned NULL, aborting
Aborted (core dumped)

api_test on the patched build: 556 tests passed, 0 failed, 0 skipped.

@jgm
Copy link
Copy Markdown
Member

jgm commented May 17, 2026

Looks good. @nwellnhof any comments?

@nwellnhof
Copy link
Copy Markdown
Contributor

The realloc result is not checked.

This is by design. The allocation functions are assumed to never return NULL.

The bufsize_t cast of strlen silently truncates.

This should be fixed. It would be best to add a new macro BUFSIZE_MAX which can also be used in src/buffer.c.

@jgm
Copy link
Copy Markdown
Member

jgm commented May 18, 2026

This is by design. The allocation functions are assumed to never return NULL.

Shouldn't this assumption be documented in the cmark(3) man page (which is derived from header comments in cmark.h)?

@nwellnhof
Copy link
Copy Markdown
Contributor

Shouldn't this assumption be documented in the cmark(3) man page (which is derived from header comments in cmark.h)?

Yes, this should be documented.

@Alearner12
Copy link
Copy Markdown
Author

@nwellnhof @jgm Thanks for the review. Pushed an updated commit addressing all feedback:

  • Dropped the realloc NULL check understood, allocators must not return NULL.
  • Added BUFSIZE_MAX macro in buffer.h and switched both node.c and buffer.c to use it instead of hardcoded INT32_MAX.
  • Documented the allocator contract in cmark.h: allocation functions must not return NULL; if unable to satisfy a request, they must abort.

@nwellnhof
Copy link
Copy Markdown
Contributor

The node setters can report errors, so there's no need to abort if the input string is too large. So you could

  • Add an output argument for the string length to cmark_set_cstr instead of returning the length.
  • Make cmark_set_cstr return 1 on success and 0 on failure.
  • Make sure that all output arguments remain unchanged in case of error.
  • Let callers propagate the result.

@nwellnhof
Copy link
Copy Markdown
Contributor

Also squash the commits and remove mentions of "OOM guard".

@Alearner12 Alearner12 force-pushed the fix/cmark-set-cstr-bounds-oom branch from 40de14c to d36a2d9 Compare May 21, 2026 13:07
@Alearner12
Copy link
Copy Markdown
Author

@nwellnhof I have pushed an updated commit addressing all feedback:

  • Dropped the realloc NULL check per project allocator design
  • Refactored cmark_set_cstr signature to accept an output length argument len and return success/failure status (1/0) instead of aborting.
  • Ensured output arguments are unchanged on error by performing the bounds check before any mutations.
  • Propagated error status through all calling string setter functions in node.c.
  • Squashed commits into a single clean commit.

Comment thread src/node.c Outdated
@@ -1,4 +1,5 @@
#include <stdbool.h>
#include <stdio.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this #include should be unnecessary now.

Comment thread src/node.c Outdated
Comment on lines +279 to +281
new_len = (bufsize_t)srclen;
new_str = (unsigned char *)mem->realloc(NULL, (size_t)new_len + 1);
memcpy(new_str, src, (size_t)new_len + 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting to bufsize_t and back to size_t is confusing. Replace (size_t)new_len with srclen.

@nwellnhof
Copy link
Copy Markdown
Contributor

Also the commit message still references the old, squashed commit which is misleading.

Refactor cmark_set_cstr to return success/failure status instead of aborting when input is too large, and let callers propagate the status.

@Alearner12 Alearner12 force-pushed the fix/cmark-set-cstr-bounds-oom branch 2 times, most recently from 5cdef44 to fcfae7e Compare May 21, 2026 14:01
@Alearner12
Copy link
Copy Markdown
Author

@nwellnhof

  • Simplified casting in cmark_set_cstr by using srclen (size_t) directly for allocation and copying operations instead of casting back and forth.
  • **Removed the unused header.
  • Cleaned up the commit message to remove the misleading reference to "aborting" (since the code on master never had an abort check).

@nwellnhof
Copy link
Copy Markdown
Contributor

The commit message could be improved, but the code change looks good to me.

@Alearner12 Alearner12 force-pushed the fix/cmark-set-cstr-bounds-oom branch from fcfae7e to c3e858e Compare May 21, 2026 16:47
@Alearner12
Copy link
Copy Markdown
Author

@nwellnhof I have updated the commit message to follow the cmark repo style (a single-line message).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants