feat: Streaming JSON Parsing API#8
Open
vnixx wants to merge 4 commits into
Open
Conversation
- JSONStreamParser: push-based streaming parser for JSON Lines and JSON Array modes - JSON Lines: extracts multiple JSON documents from a byte stream using STOP_WHEN_DONE - JSON Array: state machine to parse elements from a large JSON array one by one - Internal buffer with lazy compaction for efficient memory management - JSONIncrementalReader: accumulates chunks for large single-document parsing - StreamingJSONLinesDecoder / StreamingJSONArrayDecoder: Codable-layer streaming decoders - JSONValueStream / DecodingStream: AsyncSequence adapters for async byte streams - AsyncSequence extensions: .jsonValues() and .decode() convenience methods - Document.streamParse: internal API using yyjson_doc_get_read_size for accurate byte counting - 33 new tests covering JSON Lines, JSON Array, incremental, edge cases, and Codable layer - All 755 existing tests pass with zero regressions
Reject malformed array streams consistently and avoid copying the unread buffer for each parsed value. Co-authored-by: Cursor <cursoragent@cursor.com>
Several issues found while reviewing the streaming JSON parsing API:
* Numeric cross-chunk splitting was silently truncating values.
yyjson with STOP_WHEN_DONE will happily parse "1" out of a buffer
whose true content is "12345" — there is no way for the parser to
tell from within yyjson that the number could be extended. The
stream parser now defers any value whose parse ends exactly at the
current buffer end and only commits it once the next chunk arrives
(or finalize() confirms it's the last token). Strings, objects,
arrays, and literals continue to work as before since yyjson
detects truncation for those itself.
* JSONIncrementalReader used substring matching on the error message
(e.g. error.message.contains("Unexpected end")) to detect
"need more data". This was fragile and could misclassify a real
syntactic error whose human-readable message coincidentally
contained those words. JSONError now carries the yyjson read error
code and the reader switches on the code directly.
* JSONIncrementalReader was declared @unchecked Sendable but was not
thread-safe — feed/finish mutated state without synchronization.
Added an internal LockedState so concurrent feed/finish calls are
serialized; added a concurrent-feed test.
* StreamingJSONLinesDecoder / StreamingJSONArrayDecoder were
serializing each parsed JSONValue back to JSON text via .data()
and then re-parsing it through ReerJSONDecoder, i.e. parsing each
value three times. JSONStreamParser now also exposes an internal
byte-slice API (parseSlices / finalizeSlices) that the streaming
decoders use directly, eliminating the round-trip.
* parseOneValue was appending YYJSON_PADDING_SIZE zero bytes to the
buffer and then removing them on every value. yyjson_read_opts in
non-INSITU mode allocates its own padded buffer internally, so this
was unnecessary churn — removed.
* JSONValueByteStream was building per-chunk Data by appending one
byte at a time inside withUnsafeMutableBytes (which couldn't even
cross await boundaries). Rewritten to read into a [UInt8] and
construct the Data once.
* Clarified docs: documented the cross-chunk numeric deferral rule,
the per-feed parse cost of JSONIncrementalReader, and the
thread-safety guarantees. Removed the redundant
options.contains(.json5) check on top of .allowTrailingCommas (the
former includes the latter) and kept the OR as a clarity comment.
13 new tests cover number/float boundary splitting in both modes,
finalize() flushing of values without trailing newline, structural
errors not being mistaken for needMore, and concurrent access.
All 768 tests pass (719 pre-existing + 49 streaming).
Co-authored-by: Cursor <cursoragent@cursor.com>
`JSONDocument` is `~Copyable`, and `Optional<~Copyable>` is not yet
supported on Swift 5.10 (the toolchain used by the Linux CI). The
`JSONIncrementalReader.feed(_:)` API previously returned
`JSONDocument?`, which compiled on macOS Swift 6 but failed on Linux
with:
error: noncopyable type 'JSONDocument' cannot be used with generic
type 'Optional<Wrapped>' yet
Replace the optional return with a dedicated non-copyable enum:
public enum JSONIncrementalReadResult: ~Copyable {
case ready(JSONDocument)
case needMoreData
}
This compiles cleanly on every supported toolchain and keeps the
call-site ergonomic (switch with a let-binding instead of optional
chaining).
Updated tests and doc examples accordingly.
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add streaming/incremental JSON parsing support to ReerJSON.
New Types
Bottom layer (JSONValue):
JSONStreamParser— push-based streaming parser with two modes:.jsonLines: extract multiple JSON documents from a byte stream (NDJSON/SSE).jsonArray: parse elements from a large JSON array one by one, O(1) memoryJSONIncrementalReader— accumulate chunks for large single-document parsingCodable layer:
StreamingJSONLinesDecoder<T>/StreamingJSONArrayDecoder<T>— typed streaming decodersAsyncSequence adapters:
JSONValueStream/JSONValueByteStream/DecodingStreamAsyncSequence.jsonValues()/.decode()convenience extensionsImplementation
YYJSON_READ_STOP_WHEN_DONE+yyjson_doc_get_read_size()for accurate byte-level buffer management[,,,]and parse each element individuallyTesting
Note on yyjson_incr_* API
yyjson 0.12.0's incremental API (
yyjson_incr_new/read/free) requires all data to be pre-loaded in the buffer —lenonly controls how far each parse step reads. It cannot handle dynamically appended data between reads. Therefore, network streaming usesSTOP_WHEN_DONEinstead.