Skip to content

ADD streaming-json report style for line-delimited product validation#1636

Open
ramesh-maddegoda wants to merge 4 commits into
mainfrom
logs-per-product
Open

ADD streaming-json report style for line-delimited product validation#1636
ramesh-maddegoda wants to merge 4 commits into
mainfrom
logs-per-product

Conversation

@ramesh-maddegoda

Copy link
Copy Markdown
Contributor

Add streaming-json report style for line-delimited product validation

Related: #1352

🗒️ Summary

This PR adds a new streaming-json report style to the Validate Tool, designed explicitly for high-throughput stream ingestion (one streaming JSON object result per product).

Key Changes

  • New Report Implementation: Created StreamingJsonReport.java to output individual product validation results as single-line, line-delimited JSON objects rather than wrapping the entire run in a massive JSON array structure.
  • Granular Log Stream Metrics: Integrated real-time counting logic directly inside the product output blocks. Every product JSON record natively tracks fatal, error, warning, and info count totals per line to simplify filtering and dashboards.
  • CLI Flag Routing: Hooked up ValidateLauncher.java to handle -s streaming-json and route stream creation/formatting dynamically based on user choices.

🤖 AI Assistance Disclosure

  • No AI assistance used
  • AI used for light assistance (e.g., suggestions, refactoring, documentation help, minor edits)
  • [] AI used for moderate content generation (AI generated some code or logic, but the developer authored or heavily revised the majority)
  • [x ] AI generated substantial portions of this code

Estimated % of code influenced by AI: 90%


⚙️ Test Data and/or Report

Manual Execution & CloudWatch Log Stream Verification

The updated binary was built into a local Docker image, pushed to an Amazon ECR registry, and executed inside an AWS ECS Task by a Nucleus Airflow workflow.

The tool successfully ran a product batch. The output stream captured directly within the AWS CloudWatch log viewer displays expected formatting per individual product line, concluding with a structural summary record:

{
    "label": "file:/mnt/data/sample-staging-bucket/sample_dir/product_data_file.xml",
    "lidvid": "urn:nasa:pds:sample_bundle:sample_collection:product_data_id::1.0",
    "status": "PASS",
    "fatal": 0,
    "error": 0,
    "warning": 3,
    "info": 0,
    "messages": [
        {
            "severity": "WARNING",
            "type": "warning.label.context_ref_mismatch",
            "message": "Context reference name mismatch. LIDVID: 'urn:nasa:pds:context:instrument:sample_node.sample_instrument'. Value: 'Sample Instrument Name' Expected one of: '[Expected Instrument Name]'",
            "line": 61
        },
        {
            "severity": "WARNING",
            "type": "warning.label.context_ref_mismatch",
            "message": "Context reference name mismatch. LIDVID: 'urn:nasa:pds:context:telescope:sample_node.sample_telescope'. Value: 'Sample Telescope Name' Expected one of: '[Expected Telescope Name 1, Expected Telescope Name 2]'",
            "line": 69
        },
        {
            "severity": "WARNING",
            "type": "warning.label.context_ref_mismatch",
            "message": "Context reference name mismatch. LIDVID: 'urn:nasa:pds:context:facility:observatory.sample_facility'. Value: 'Sample Observatory Facility Name' Expected one of: '[Expected Facility Name]'",
            "line": 77
        }
    ]
}

- Implement StreamingJsonReport class to output validation results as line-delimited JSON (NDJSON) to stdout or a file.
- Add error, fatal, warning, and info metrics tracking per product line to streamline AWS CloudWatch log filtering.
- Update ValidateLauncher to support the new 'streaming-json' style flag option and route report setup accordingly.
- Fix WSL2 file permission tracking clutter by configuring repository core.fileMode to false.

Related: #1352
@ramesh-maddegoda ramesh-maddegoda requested a review from a team as a code owner July 2, 2026 22:29
@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

@nutjob4life nutjob4life left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a few things and left interspersed comments. What do you think? If these are all non-issues after all, by all means, let's move forward. Thanks in advance!

@Override
protected void summarizeProds(int failed, int passed, int skipped, int total) {
JsonObject summary = new JsonObject();
summary.addProperty("recordType", "summary");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm understanding: here, we add a recordType of summary to summaries but we don't do something similar to products, like

"recordType": "product"

or something?

}

@Override
protected void summarizeProds(int failed, int passed, int skipped, int total) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method receives int skipped as a parameter but doesn't use it. Should it go into the summary?

summary.addProperty("skipped", skipped);

getWriter().println(gson.toJson(this.currentProduct));
getWriter().flush();
} else {
LOG.info(gson.toJson(this.currentProduct));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if no writer is configured, LOG.info can add timestamps, log levels, class names, or other formatting cruft, which would make each line no longer NDJSON, right?

}
}
this.currentMessages.add(msg);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both append methods asusme begin(Block.LABEL) has already initialized currentProduct and currentMessages. If the report lifecycle ever emits problems outside a label block, this could throw a NullPointerException. May be worth guarding or making the lifecycle assumption explicit?

@jordanpadams jordanpadams left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Good approach overall. A few issues to address before merge:


StreamingJsonReport.java:122summarizeProds drops skipped
The skipped parameter is never written to the summary JSON. JSONReport correctly emits it. Add summary.addProperty("skipped", skipped).


ValidateLauncher.java:1055 — use InvalidOptionException, not Exception
The PR touches this line (error string changed). The project already has InvalidOptionException imported and used in this same class (lines 344, 347, 357, 364) for exactly this case. Replace throw new Exception(...) with throw new InvalidOptionException(...).


StreamingJsonReport.java:72,129getWriter() != null is always true
Report.writer is initialized inline to new PrintWriter(new OutputStreamWriter(System.out)) — it's never null. The LOG.info(...) fallback branches are dead code. Either remove the null check or document the intent.


StreamingJsonReport.java:3 — use SLF4J, not java.util.logging
All other report classes use org.slf4j.Logger / LoggerFactory. Using JUL here bypasses the project's logging configuration and any MDC context.


No way to distinguish product validation from referential integrity validation in the log stream
When running on a bundle, two JSON records are emitted for the same LIDVID — one for product-level validation and one for referential integrity. Both look identical (same label, same lidvid, potentially both PASS). The full-text report separates these visually with section headers like "Product Level Validation Results" and "PDS4 Bundle Level Validation Results", but the streaming-json output has no equivalent. A consumer cannot tell which record is which.

Suggest adding a "validationType" field (e.g. "product" / "referential") to each product JSON record so log consumers can filter or route accordingly.


--help output does not list streaming-json as a valid option
The -s/--report-style help text currently reads:

-s,--report-style <full|json|xml>   ...Valid values are 'full', 'json', and 'xml'...

This needs to be updated to include streaming-json, otherwise users have no way to discover the new flag.


🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants