doctemplate: CRLF templates produce extra blank lines around multiline $if$ / $for$

On Windows with `core.autocrlf=true`, doctemplate templates with CRLF line endings render with extra blank lines around every multiline `$if(...)$` and `$for(...)$` directive. The symptom currently shows up only as 8 failing tests in `quarto-doctemplate`, but the underlying engine bug means a Windows author of any Quarto template gets visibly wrong output in every format the doctemplate engine drives. We don't have Windows CI yet, so this hasn't surfaced for the rest of the team in `cargo nextest run`.

I reproduced this on Windows at `ebd04493`.

## Reproducer

```rust
let crlf_source = "before\r\n$if(show)$\r\ncontent\r\n$endif$\r\nafter\r\n";
let template = Template::compile(crlf_source).unwrap();
let mut ctx = TemplateContext::new();
ctx.insert("show", TemplateValue::Bool(true));
let result = template.render(&ctx).unwrap();
// expected: "before\ncontent\nafter\n"
// actual:   "before\r\n\r\ncontent\r\n\r\nafter\r\n"
```

Same shape on `$for$` / `$endfor$`, nested `$if$` / `$for$`, and `$else$`.

## Root cause

`normalize_multiline_directives` (run after tree-sitter parsing) detects "directive on its own line" by checking the first character of the body Literal. The detection helpers hardcode `'\n'`:

https://github.com/quarto-dev/q2/blob/ebd044938a18e2984a9e5439f88ef5a3aa162326/crates/quarto-doctemplate/src/parser.rs#L1067-L1095

For CRLF input the body Literal starts with `\r\n`, so `starts_with('\n')` returns false, `is_multiline` stays false, and the branch that consumes the leading and trailing newlines around the directive never runs:

https://github.com/quarto-dev/q2/blob/ebd044938a18e2984a9e5439f88ef5a3aa162326/crates/quarto-doctemplate/src/parser.rs#L973-L1056

Pandoc's `doctemplates`, by contrast, is parser-aware. Its endline parser accepts all three conventions and returns whatever was matched, so multiline directive consumption is line-ending-agnostic and the output preserves the input convention:

```haskell
pLineEnding = P.string "\n" <|> P.try (P.string "\r\n") <|> P.string "\r"
isSpacy '\r' = True
pLit = P.many1 (P.satisfy (\c -> c /= '$' && c /= '\n' && c /= '\r'))
```

https://github.com/jgm/doctemplates/blob/master/src/Text/DocTemplates/Parser.hs#L262-L263

Pandoc's `--eol=crlf|lf|native` is a separate writer option layered on top.

## Constraints I see for q2 on Windows

CRLF input must render correctly. Output should preserve the input line-ending convention — silently rewriting bytes mid-pipeline diverges from Pandoc and surprises Windows users whose rest-of-file convention is CRLF. Source spans (`node.start_byte()`, `start_position`) need to keep mapping back to the on-disk file or diagnostics drift.

A one-line ingress normalize (CRLF→LF before parsing) is out: it loses the input convention and shifts every byte offset by one per preceding CRLF.

## Approaches

We could teach the Rust normalization helpers to recognize `\r\n` and `\r` in addition to `\n`, plus an audit of `tree-sitter-doctemplate` to see whether the grammar needs the same alternation or whether the Rust pass alone is enough. Bytes preserved end-to-end. Same shape of work as #139 did for `tree-sitter-qmd` pipe tables, smaller scope.

We could also normalize CRLF→LF for the parser internally with a side-table mapping normalized→original byte positions for diagnostics, then re-emit the input convention on render. More machinery, easier to forget the side-table when adding new diagnostics.

## Open question

Is "preserve input line-ending convention end-to-end" the policy we want for q2 on Windows? Or would we rather always normalize to LF on output, or expose a writer-side option like Pandoc's `--eol`?

This is broader than `quarto-doctemplate` — pampa output, the JSON / native writers, and any future tree-sitter grammar will face the same question. Picking a policy here sets the precedent.

If the answer is "preserve input convention", I'll scope the `tree-sitter-doctemplate` audit, add a CRLF regression test that builds the input in-process so Linux CI catches future regressions (same pattern as `pipe_table_crlf_matches_lf` from #139), and update the 8 affected `quarto-doctemplate` tests. Internal tracker is bd-1d3e.


	/// Check if the first node in a list is a Literal starting with '\n'.
	fn first_node_is_newline_literal(nodes: &[TemplateNode]) -> bool {
	if let Some(TemplateNode::Literal(lit)) = nodes.first() {
	lit.text.starts_with('\n')
	} else {
	false
	}
	}

	/// Strip a leading '\n' from the first Literal node if present.
	fn strip_leading_newline_from_nodes(nodes: &mut Vec<TemplateNode>) {
	if let Some(first) = nodes.first_mut() {
	strip_leading_newline_from_node(first);
	// If the node became empty, remove it
	if let TemplateNode::Literal(lit) = first
	&& lit.text.is_empty()
	{
	nodes.remove(0);
	}
	}
	}

	/// Strip a leading '\n' from a node if it's a Literal starting with '\n'.
	fn strip_leading_newline_from_node(node: &mut TemplateNode) {
	if let TemplateNode::Literal(lit) = node
	&& lit.text.starts_with('\n')
	{
	lit.text = lit.text[1..].to_string();
	}

	fn normalize_multiline_directives(nodes: &mut Vec<TemplateNode>) {
	// Process each node, with access to the next sibling for lookahead
	let mut i = 0;
	while i < nodes.len() {
	match &mut nodes[i] {
	TemplateNode::Conditional(cond) => {
	// Check if this is a multiline conditional
	let is_multiline = is_first_child_newline_literal(&cond.branches);

	if is_multiline {
	// Strip leading newline from body of each branch
	for (_condition, body) in &mut cond.branches {
	strip_leading_newline_from_nodes(body);
	// Recursively normalize nested directives
	normalize_multiline_directives(body);
	}

	// Strip leading newline from else branch if present
	if let Some(else_body) = &mut cond.else_branch {
	strip_leading_newline_from_nodes(else_body);
	normalize_multiline_directives(else_body);
	}

	// Strip leading newline from next sibling if it's a Literal
	if i + 1 < nodes.len() {
	strip_leading_newline_from_node(&mut nodes[i + 1]);
	}
	} else {
	// Still need to recursively normalize nested directives
	for (_condition, body) in &mut cond.branches {
	normalize_multiline_directives(body);
	}
	if let Some(else_body) = &mut cond.else_branch {
	normalize_multiline_directives(else_body);
	}
	}
	}

	TemplateNode::ForLoop(for_loop) => {
	// Check if this is a multiline for loop
	let is_multiline = first_node_is_newline_literal(&for_loop.body);

	if is_multiline {
	// Strip leading newline from body
	strip_leading_newline_from_nodes(&mut for_loop.body);
	normalize_multiline_directives(&mut for_loop.body);

	// Strip leading newline from separator if present
	if let Some(sep) = &mut for_loop.separator {
	strip_leading_newline_from_nodes(sep);
	normalize_multiline_directives(sep);
	}

	// Strip leading newline from next sibling if it's a Literal
	if i + 1 < nodes.len() {
	strip_leading_newline_from_node(&mut nodes[i + 1]);
	}
	} else {
	// Still need to recursively normalize nested directives
	normalize_multiline_directives(&mut for_loop.body);
	if let Some(sep) = &mut for_loop.separator {
	normalize_multiline_directives(sep);
	}
	}
	}

	TemplateNode::Nesting(nesting) => {
	normalize_multiline_directives(&mut nesting.children);
	}

	TemplateNode::BreakableSpace(bs) => {
	normalize_multiline_directives(&mut bs.children);
	}

	// Other node types don't need processing
	TemplateNode::Literal(_)
	\| TemplateNode::Variable(_)
	\| TemplateNode::Partial(_)
	\| TemplateNode::Comment(_) => {}
	}

	i += 1;
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doctemplate: CRLF templates produce extra blank lines around multiline $if$ / $for$ #157

Reproducer

Root cause

Constraints I see for q2 on Windows

Approaches

Open question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

doctemplate: CRLF templates produce extra blank lines around multiline $if$ / $for$ #157

Description

Reproducer

Root cause

Constraints I see for q2 on Windows

Approaches

Open question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions