ADD streaming-json report style for line-delimited product validation#1636
ADD streaming-json report style for line-delimited product validation#1636ramesh-maddegoda wants to merge 4 commits into
Conversation
- Implement StreamingJsonReport class to output validation results as line-delimited JSON (NDJSON) to stdout or a file. - Add error, fatal, warning, and info metrics tracking per product line to streamline AWS CloudWatch log filtering. - Update ValidateLauncher to support the new 'streaming-json' style flag option and route report setup accordingly. - Fix WSL2 file permission tracking clutter by configuring repository core.fileMode to false. Related: #1352
|
nutjob4life
left a comment
There was a problem hiding this comment.
Noticed a few things and left interspersed comments. What do you think? If these are all non-issues after all, by all means, let's move forward. Thanks in advance!
| @Override | ||
| protected void summarizeProds(int failed, int passed, int skipped, int total) { | ||
| JsonObject summary = new JsonObject(); | ||
| summary.addProperty("recordType", "summary"); |
There was a problem hiding this comment.
I don't think I'm understanding: here, we add a recordType of summary to summaries but we don't do something similar to products, like
"recordType": "product"
or something?
| } | ||
|
|
||
| @Override | ||
| protected void summarizeProds(int failed, int passed, int skipped, int total) { |
There was a problem hiding this comment.
This method receives int skipped as a parameter but doesn't use it. Should it go into the summary?
summary.addProperty("skipped", skipped);| getWriter().println(gson.toJson(this.currentProduct)); | ||
| getWriter().flush(); | ||
| } else { | ||
| LOG.info(gson.toJson(this.currentProduct)); |
There was a problem hiding this comment.
I think if no writer is configured, LOG.info can add timestamps, log levels, class names, or other formatting cruft, which would make each line no longer NDJSON, right?
| } | ||
| } | ||
| this.currentMessages.add(msg); | ||
| } |
There was a problem hiding this comment.
Both append methods asusme begin(Block.LABEL) has already initialized currentProduct and currentMessages. If the report lifecycle ever emits problems outside a label block, this could throw a NullPointerException. May be worth guarding or making the lifecycle assumption explicit?
jordanpadams
left a comment
There was a problem hiding this comment.
Code Review
Good approach overall. A few issues to address before merge:
StreamingJsonReport.java:122 — summarizeProds drops skipped
The skipped parameter is never written to the summary JSON. JSONReport correctly emits it. Add summary.addProperty("skipped", skipped).
ValidateLauncher.java:1055 — use InvalidOptionException, not Exception
The PR touches this line (error string changed). The project already has InvalidOptionException imported and used in this same class (lines 344, 347, 357, 364) for exactly this case. Replace throw new Exception(...) with throw new InvalidOptionException(...).
StreamingJsonReport.java:72,129 — getWriter() != null is always true
Report.writer is initialized inline to new PrintWriter(new OutputStreamWriter(System.out)) — it's never null. The LOG.info(...) fallback branches are dead code. Either remove the null check or document the intent.
StreamingJsonReport.java:3 — use SLF4J, not java.util.logging
All other report classes use org.slf4j.Logger / LoggerFactory. Using JUL here bypasses the project's logging configuration and any MDC context.
No way to distinguish product validation from referential integrity validation in the log stream
When running on a bundle, two JSON records are emitted for the same LIDVID — one for product-level validation and one for referential integrity. Both look identical (same label, same lidvid, potentially both PASS). The full-text report separates these visually with section headers like "Product Level Validation Results" and "PDS4 Bundle Level Validation Results", but the streaming-json output has no equivalent. A consumer cannot tell which record is which.
Suggest adding a "validationType" field (e.g. "product" / "referential") to each product JSON record so log consumers can filter or route accordingly.
--help output does not list streaming-json as a valid option
The -s/--report-style help text currently reads:
-s,--report-style <full|json|xml> ...Valid values are 'full', 'json', and 'xml'...
This needs to be updated to include streaming-json, otherwise users have no way to discover the new flag.
🤖 Generated with Claude Code



Add streaming-json report style for line-delimited product validation
Related: #1352
🗒️ Summary
This PR adds a new
streaming-jsonreport style to the Validate Tool, designed explicitly for high-throughput stream ingestion (one streaming JSON object result per product).Key Changes
StreamingJsonReport.javato output individual product validation results as single-line, line-delimited JSON objects rather than wrapping the entire run in a massive JSON array structure.fatal,error,warning, andinfocount totals per line to simplify filtering and dashboards.ValidateLauncher.javato handle-s streaming-jsonand route stream creation/formatting dynamically based on user choices.🤖 AI Assistance Disclosure
Estimated % of code influenced by AI: 90%
⚙️ Test Data and/or Report
Manual Execution & CloudWatch Log Stream Verification
The updated binary was built into a local Docker image, pushed to an Amazon ECR registry, and executed inside an AWS ECS Task by a Nucleus Airflow workflow.
The tool successfully ran a product batch. The output stream captured directly within the AWS CloudWatch log viewer displays expected formatting per individual product line, concluding with a structural summary record:
{ "label": "file:/mnt/data/sample-staging-bucket/sample_dir/product_data_file.xml", "lidvid": "urn:nasa:pds:sample_bundle:sample_collection:product_data_id::1.0", "status": "PASS", "fatal": 0, "error": 0, "warning": 3, "info": 0, "messages": [ { "severity": "WARNING", "type": "warning.label.context_ref_mismatch", "message": "Context reference name mismatch. LIDVID: 'urn:nasa:pds:context:instrument:sample_node.sample_instrument'. Value: 'Sample Instrument Name' Expected one of: '[Expected Instrument Name]'", "line": 61 }, { "severity": "WARNING", "type": "warning.label.context_ref_mismatch", "message": "Context reference name mismatch. LIDVID: 'urn:nasa:pds:context:telescope:sample_node.sample_telescope'. Value: 'Sample Telescope Name' Expected one of: '[Expected Telescope Name 1, Expected Telescope Name 2]'", "line": 69 }, { "severity": "WARNING", "type": "warning.label.context_ref_mismatch", "message": "Context reference name mismatch. LIDVID: 'urn:nasa:pds:context:facility:observatory.sample_facility'. Value: 'Sample Observatory Facility Name' Expected one of: '[Expected Facility Name]'", "line": 77 } ] }