Skip to content

text.linesLimited lets through arbitrarily long lines when the line and its terminator land in the same chunk #3725

@haskiindahouse

Description

@haskiindahouse

Version

Scala 3.8.3, fs2-core 3.13.0 (also reproduces on 3.12.0; verified against current main).

Minimized code

//> using scala 3.8.3
//> using dep co.fs2::fs2-core:3.13.0

import fs2._

@main def repro091(): Unit =
  // 1000 chars, limit is 10
  val longLine = "x" * 1000 + "\n"

  val singleChunk = Stream.emit(longLine).covary[Fallible]
    .through(text.linesLimited(10)).toList
  println(s"Single chunk: $singleChunk")

  val multiChunk = Stream.emits(longLine.toList.map(_.toString)).covary[Fallible]
    .through(text.linesLimited(10)).toList
  println(s"Multi  chunk: $multiChunk")

Console output

Single chunk: Right(List(xxxxxx...1000 chars))
Multi  chunk: Left(LineTooLongException: ... limit 10)

Expected result

text.linesLimited(maxLineLength) raises LineTooLongException whenever any line exceeds maxLineLength, regardless of how the input is chunked.

Actual result

When a complete line including its terminator arrives inside one chunk, the line is processed entirely inside fillBuffers and pushed straight into linesBuffer (the completed-lines accumulator). The maxLineLength check at line 556 of text.scala only inspects stringBuilder.length — the pending line — which is 0 after the line has been flushed. Lines already in linesBuffer aren't checked against the limit, so any line completed within a chunk passes through.

Source

def fillBuffers(
stringBuilder: StringBuilder,
linesBuffer: ArrayBuffer[String],
string: String,
ignoreFirstCharNewLine: BoolWrapper
): Unit = {
var i = if (ignoreFirstCharNewLine.value) {
ignoreFirstCharNewLine.value = false
if (string.nonEmpty && string(0) == '\n') {
1
} else {
0
}
} else {
0
}
val stringSize = string.size
while (i < stringSize) {
val idx = indexForNl(string, stringSize, i)
if (idx < 0) {
stringBuilder.appendAll(string.slice(i, stringSize))
i = stringSize
} else {
if (stringBuilder.isEmpty) {
linesBuffer += string.slice(i, idx)
} else {
stringBuilder.appendAll(string.slice(i, idx))
linesBuffer += stringBuilder.result()
stringBuilder.clear()
}
i = idx + 1
if (string(i - 1) == '\r') {
if (i < stringSize) {
if (string(i) == '\n') {
i += 1
}
} else {
ignoreFirstCharNewLine.value = true
}
}
}
}
}
def go(
stream: Stream[F, String],
stringBuilder: StringBuilder,
ignoreFirstCharNewLine: BoolWrapper,
first: Boolean
): Pull[F, String, Unit] =
stream.pull.uncons.flatMap {
case None =>
if (first) Pull.done
else {
val result = stringBuilder.result()
if (result.nonEmpty && result.last == '\r')
Pull.output(
Chunk(
result.dropRight(1),
""
)
)
else Pull.output1(result)
}
case Some((chunk, stream)) =>
val linesBuffer = ArrayBuffer.empty[String]
chunk.foreach { string =>
fillBuffers(stringBuilder, linesBuffer, string, ignoreFirstCharNewLine)
}
maxLineLength match {
case Some((max, raiseThrowable)) if stringBuilder.length > max =>
Pull.raiseError[F](
new LineTooLongException(stringBuilder.length, max)
)(using raiseThrowable)
case _ =>
Pull.output(Chunk.from(linesBuffer)) >> go(
stream,
stringBuilder,
ignoreFirstCharNewLine,
first = false
)
}

fillBuffers (lines 484-527) processes input character by character. On a newline the completed line is pushed to linesBuffer (lines 509 / 512) and stringBuilder is cleared (line 513). After fillBuffers returns, the check at lines 555-559 only looks at stringBuilder.length:

maxLineLength match {
  case Some((max, raiseThrowable)) if stringBuilder.length > max =>
    Pull.raiseError[F](new LineTooLongException(stringBuilder.length, max))(using raiseThrowable)
  case _ =>
    Pull.output(Chunk.from(linesBuffer)) >> go(...)
}

Two ways to fix:

  • Check the length of each line as it's added to linesBuffer inside fillBuffers and raise immediately, or
  • Check all lines in linesBuffer before outputting them.

The single existing test in TextSuite.scala line 311 only covers a no-newline single-line input — which always sits in stringBuilder and is properly checked — so this gap was hidden.

Happy to follow up with a PR adding both a fix and a regression test that exercises both chunkings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions