On Windows with core.autocrlf=true, doctemplate templates with CRLF line endings render with extra blank lines around every multiline $if(...)$ and $for(...)$ directive. The symptom currently shows up only as 8 failing tests in quarto-doctemplate, but the underlying engine bug means a Windows author of any Quarto template gets visibly wrong output in every format the doctemplate engine drives. We don't have Windows CI yet, so this hasn't surfaced for the rest of the team in cargo nextest run.
I reproduced this on Windows at ebd04493.
Reproducer
let crlf_source = "before\r\n$if(show)$\r\ncontent\r\n$endif$\r\nafter\r\n";
let template = Template::compile(crlf_source).unwrap();
let mut ctx = TemplateContext::new();
ctx.insert("show", TemplateValue::Bool(true));
let result = template.render(&ctx).unwrap();
// expected: "before\ncontent\nafter\n"
// actual: "before\r\n\r\ncontent\r\n\r\nafter\r\n"
Same shape on $for$ / $endfor$, nested $if$ / $for$, and $else$.
Root cause
normalize_multiline_directives (run after tree-sitter parsing) detects "directive on its own line" by checking the first character of the body Literal. The detection helpers hardcode '\n':
|
/// Check if the first node in a list is a Literal starting with '\n'. |
|
fn first_node_is_newline_literal(nodes: &[TemplateNode]) -> bool { |
|
if let Some(TemplateNode::Literal(lit)) = nodes.first() { |
|
lit.text.starts_with('\n') |
|
} else { |
|
false |
|
} |
|
} |
|
|
|
/// Strip a leading '\n' from the first Literal node if present. |
|
fn strip_leading_newline_from_nodes(nodes: &mut Vec<TemplateNode>) { |
|
if let Some(first) = nodes.first_mut() { |
|
strip_leading_newline_from_node(first); |
|
// If the node became empty, remove it |
|
if let TemplateNode::Literal(lit) = first |
|
&& lit.text.is_empty() |
|
{ |
|
nodes.remove(0); |
|
} |
|
} |
|
} |
|
|
|
/// Strip a leading '\n' from a node if it's a Literal starting with '\n'. |
|
fn strip_leading_newline_from_node(node: &mut TemplateNode) { |
|
if let TemplateNode::Literal(lit) = node |
|
&& lit.text.starts_with('\n') |
|
{ |
|
lit.text = lit.text[1..].to_string(); |
|
} |
For CRLF input the body Literal starts with \r\n, so starts_with('\n') returns false, is_multiline stays false, and the branch that consumes the leading and trailing newlines around the directive never runs:
|
fn normalize_multiline_directives(nodes: &mut Vec<TemplateNode>) { |
|
// Process each node, with access to the next sibling for lookahead |
|
let mut i = 0; |
|
while i < nodes.len() { |
|
match &mut nodes[i] { |
|
TemplateNode::Conditional(cond) => { |
|
// Check if this is a multiline conditional |
|
let is_multiline = is_first_child_newline_literal(&cond.branches); |
|
|
|
if is_multiline { |
|
// Strip leading newline from body of each branch |
|
for (_condition, body) in &mut cond.branches { |
|
strip_leading_newline_from_nodes(body); |
|
// Recursively normalize nested directives |
|
normalize_multiline_directives(body); |
|
} |
|
|
|
// Strip leading newline from else branch if present |
|
if let Some(else_body) = &mut cond.else_branch { |
|
strip_leading_newline_from_nodes(else_body); |
|
normalize_multiline_directives(else_body); |
|
} |
|
|
|
// Strip leading newline from next sibling if it's a Literal |
|
if i + 1 < nodes.len() { |
|
strip_leading_newline_from_node(&mut nodes[i + 1]); |
|
} |
|
} else { |
|
// Still need to recursively normalize nested directives |
|
for (_condition, body) in &mut cond.branches { |
|
normalize_multiline_directives(body); |
|
} |
|
if let Some(else_body) = &mut cond.else_branch { |
|
normalize_multiline_directives(else_body); |
|
} |
|
} |
|
} |
|
|
|
TemplateNode::ForLoop(for_loop) => { |
|
// Check if this is a multiline for loop |
|
let is_multiline = first_node_is_newline_literal(&for_loop.body); |
|
|
|
if is_multiline { |
|
// Strip leading newline from body |
|
strip_leading_newline_from_nodes(&mut for_loop.body); |
|
normalize_multiline_directives(&mut for_loop.body); |
|
|
|
// Strip leading newline from separator if present |
|
if let Some(sep) = &mut for_loop.separator { |
|
strip_leading_newline_from_nodes(sep); |
|
normalize_multiline_directives(sep); |
|
} |
|
|
|
// Strip leading newline from next sibling if it's a Literal |
|
if i + 1 < nodes.len() { |
|
strip_leading_newline_from_node(&mut nodes[i + 1]); |
|
} |
|
} else { |
|
// Still need to recursively normalize nested directives |
|
normalize_multiline_directives(&mut for_loop.body); |
|
if let Some(sep) = &mut for_loop.separator { |
|
normalize_multiline_directives(sep); |
|
} |
|
} |
|
} |
|
|
|
TemplateNode::Nesting(nesting) => { |
|
normalize_multiline_directives(&mut nesting.children); |
|
} |
|
|
|
TemplateNode::BreakableSpace(bs) => { |
|
normalize_multiline_directives(&mut bs.children); |
|
} |
|
|
|
// Other node types don't need processing |
|
TemplateNode::Literal(_) |
|
| TemplateNode::Variable(_) |
|
| TemplateNode::Partial(_) |
|
| TemplateNode::Comment(_) => {} |
|
} |
|
|
|
i += 1; |
|
} |
|
} |
Pandoc's doctemplates, by contrast, is parser-aware. Its endline parser accepts all three conventions and returns whatever was matched, so multiline directive consumption is line-ending-agnostic and the output preserves the input convention:
pLineEnding = P.string "\n" <|> P.try (P.string "\r\n") <|> P.string "\r"
isSpacy '\r' = True
pLit = P.many1 (P.satisfy (\c -> c /= '$' && c /= '\n' && c /= '\r'))
https://github.com/jgm/doctemplates/blob/master/src/Text/DocTemplates/Parser.hs#L262-L263
Pandoc's --eol=crlf|lf|native is a separate writer option layered on top.
Constraints I see for q2 on Windows
CRLF input must render correctly. Output should preserve the input line-ending convention — silently rewriting bytes mid-pipeline diverges from Pandoc and surprises Windows users whose rest-of-file convention is CRLF. Source spans (node.start_byte(), start_position) need to keep mapping back to the on-disk file or diagnostics drift.
A one-line ingress normalize (CRLF→LF before parsing) is out: it loses the input convention and shifts every byte offset by one per preceding CRLF.
Approaches
We could teach the Rust normalization helpers to recognize \r\n and \r in addition to \n, plus an audit of tree-sitter-doctemplate to see whether the grammar needs the same alternation or whether the Rust pass alone is enough. Bytes preserved end-to-end. Same shape of work as #139 did for tree-sitter-qmd pipe tables, smaller scope.
We could also normalize CRLF→LF for the parser internally with a side-table mapping normalized→original byte positions for diagnostics, then re-emit the input convention on render. More machinery, easier to forget the side-table when adding new diagnostics.
Open question
Is "preserve input line-ending convention end-to-end" the policy we want for q2 on Windows? Or would we rather always normalize to LF on output, or expose a writer-side option like Pandoc's --eol?
This is broader than quarto-doctemplate — pampa output, the JSON / native writers, and any future tree-sitter grammar will face the same question. Picking a policy here sets the precedent.
If the answer is "preserve input convention", I'll scope the tree-sitter-doctemplate audit, add a CRLF regression test that builds the input in-process so Linux CI catches future regressions (same pattern as pipe_table_crlf_matches_lf from #139), and update the 8 affected quarto-doctemplate tests. Internal tracker is bd-1d3e.
On Windows with
core.autocrlf=true, doctemplate templates with CRLF line endings render with extra blank lines around every multiline$if(...)$and$for(...)$directive. The symptom currently shows up only as 8 failing tests inquarto-doctemplate, but the underlying engine bug means a Windows author of any Quarto template gets visibly wrong output in every format the doctemplate engine drives. We don't have Windows CI yet, so this hasn't surfaced for the rest of the team incargo nextest run.I reproduced this on Windows at
ebd04493.Reproducer
Same shape on
$for$/$endfor$, nested$if$/$for$, and$else$.Root cause
normalize_multiline_directives(run after tree-sitter parsing) detects "directive on its own line" by checking the first character of the body Literal. The detection helpers hardcode'\n':q2/crates/quarto-doctemplate/src/parser.rs
Lines 1067 to 1095 in ebd0449
For CRLF input the body Literal starts with
\r\n, sostarts_with('\n')returns false,is_multilinestays false, and the branch that consumes the leading and trailing newlines around the directive never runs:q2/crates/quarto-doctemplate/src/parser.rs
Lines 973 to 1056 in ebd0449
Pandoc's
doctemplates, by contrast, is parser-aware. Its endline parser accepts all three conventions and returns whatever was matched, so multiline directive consumption is line-ending-agnostic and the output preserves the input convention:https://github.com/jgm/doctemplates/blob/master/src/Text/DocTemplates/Parser.hs#L262-L263
Pandoc's
--eol=crlf|lf|nativeis a separate writer option layered on top.Constraints I see for q2 on Windows
CRLF input must render correctly. Output should preserve the input line-ending convention — silently rewriting bytes mid-pipeline diverges from Pandoc and surprises Windows users whose rest-of-file convention is CRLF. Source spans (
node.start_byte(),start_position) need to keep mapping back to the on-disk file or diagnostics drift.A one-line ingress normalize (CRLF→LF before parsing) is out: it loses the input convention and shifts every byte offset by one per preceding CRLF.
Approaches
We could teach the Rust normalization helpers to recognize
\r\nand\rin addition to\n, plus an audit oftree-sitter-doctemplateto see whether the grammar needs the same alternation or whether the Rust pass alone is enough. Bytes preserved end-to-end. Same shape of work as #139 did fortree-sitter-qmdpipe tables, smaller scope.We could also normalize CRLF→LF for the parser internally with a side-table mapping normalized→original byte positions for diagnostics, then re-emit the input convention on render. More machinery, easier to forget the side-table when adding new diagnostics.
Open question
Is "preserve input line-ending convention end-to-end" the policy we want for q2 on Windows? Or would we rather always normalize to LF on output, or expose a writer-side option like Pandoc's
--eol?This is broader than
quarto-doctemplate— pampa output, the JSON / native writers, and any future tree-sitter grammar will face the same question. Picking a policy here sets the precedent.If the answer is "preserve input convention", I'll scope the
tree-sitter-doctemplateaudit, add a CRLF regression test that builds the input in-process so Linux CI catches future regressions (same pattern aspipe_table_crlf_matches_lffrom #139), and update the 8 affectedquarto-doctemplatetests. Internal tracker is bd-1d3e.