Skip to content

Perfmaxx json2xml#303

Merged
vinitkumar merged 10 commits into
masterfrom
perfmaxx-json2xml
May 27, 2026
Merged

Perfmaxx json2xml#303
vinitkumar merged 10 commits into
masterfrom
perfmaxx-json2xml

Conversation

@vinitkumar
Copy link
Copy Markdown
Owner

@vinitkumar vinitkumar commented May 27, 2026

Summary by Sourcery

Optimize JSON-to-XML conversion performance and backend selection while preserving existing output semantics and improving lazy imports.

New Features:

  • Add cached fast-path validation for common ASCII XML element names and reuse validated names across scalar conversions.
  • Introduce specialized conversion helpers for already-validated keys to avoid repeated normalization and attribute work.

Bug Fixes:

  • Ensure the fast backend selector falls back to the Python serializer for root scalar payloads to preserve legacy <item> wrapping semantics.
  • Align Json2xml to delegate through the fast backend selector so library and CLI callers benefit from the Rust accelerator when available.

Enhancements:

  • Speed up XML type detection and primitive checks by tightening type logic and avoiding unnecessary introspection.
  • Short-circuit XML escaping when strings contain no escapable characters and simplify attribute string building for common cases.
  • Lazy-load defusedxml and urllib3 so XML pretty-printing and URL reads only pay import and initialization costs when used.
  • Clarify backend selection and conversion behavior in architecture and behavior documentation, including rules around root scalars and pretty-print validation.

Tests:

  • Add tests verifying Json2xml delegates through the fast backend selector and that root scalar payloads continue to use the Python fallback.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 27, 2026

Reviewer's Guide

Optimize JSON-to-XML conversion performance and startup by adding lazy imports, cached XML name/type handling, and fast-path helpers, while tightening the Rust fast-backend selection semantics and updating behavior docs/tests accordingly.

Sequence diagram for readfromurl lazy HTTP client initialization

sequenceDiagram
    participant Caller
    participant Utils as utils.readfromurl
    participant HTTPInit as utils._get_http_client
    participant Urllib3 as urllib3

    Caller->>Utils: readfromurl(url, params)
    Utils->>HTTPInit: _get_http_client()
    alt first call
        HTTPInit->>Urllib3: import urllib3
        HTTPInit->>Urllib3: Timeout(connect, read)
        HTTPInit->>Urllib3: PoolManager()
    end
    HTTPInit-->>Utils: (urllib3, http, timeout)
    Utils->>Urllib3: http.request("GET", url, fields=params, timeout=timeout, retries=False)
    Urllib3-->>Utils: response
    Utils-->>Caller: parsed JSON or URLReadError
Loading

Flow diagram for fast backend selection in dicttoxml_fast.dicttoxml

flowchart TD
    A[dicttoxml_fast.dicttoxml called
    with obj and options] --> B{Any unsupported
    feature enabled?}

    B -->|ids or item_func or
    xml_namespaces or xpath_format| D[Use Python
    dicttoxml]
    B -->|obj is not dict or list| D
    B -->|no unsupported
    features and obj is
    dict or list| C[Use Rust
    backend]

    C --> E[Return XML bytes]
    D --> E
Loading

File-Level Changes

Change Details Files
Speed up XML type detection, escaping, and attribute rendering while reusing validated element names for common scalar paths.
  • Refactor get_xml_type to use direct type checks and a fast path for None, primitives, numbers, dicts, and sequences.
  • Short-circuit escape_xml when the input string contains no escapable characters, using a precomputed escape-character set.
  • Optimize make_attrstring for empty and single-key attribute dicts, specializing the common "type" attribute case and simplifying spacing logic.
  • Introduce convert_kv_valid_name/convert_bool_valid_name/convert_none_valid_name helpers that assume prevalidated keys and reuse type/attribute logic without revalidation or extra conversions.
  • Adjust is_primitive_type to use isinstance checks over get_xml_type string comparisons for primitives and None.
json2xml/dicttoxml.py
Introduce a cached fast XML-name validator and integrate it into dict/list conversion paths to minimize parser invocations.
  • Add _is_fast_valid_xml_name to cheaply validate common ASCII XML names against legacy rules, rejecting empty, colon, and invalid-start characters.
  • Wrap key_is_valid_xml with lru_cache and a fast path that accepts common ASCII names, normalizes non-string keys via str(), and eagerly rejects purely numeric or colon-containing keys before parser use.
  • Move the defusedxml.minidom.parseString import inside key_is_valid_xml so the XML parser is only loaded when needed for unusual names.
  • Update convert_dict and convert_list scalar/boolean/None branches to call the new *_valid_name helpers after name validation, and to propagate any name-derived attributes returned by make_valid_xml_name into element attrs.
json2xml/dicttoxml.py
Refine list conversion behavior around item naming and scalar keys while preserving existing wrapper semantics.
  • Change convert_list to normalize item_name via make_valid_xml_name (including "@Flat" handling) and to compute a scalar_key that respects item_wrap and parent naming, along with any generated attributes.
  • Ensure scalar list items (numbers/strings) and datetimes use scalar_key/item_name consistently in convert_kv_valid_name calls so validated names and attributes are reused.
  • Propagate item_name_attr/scalar_key_attr into the per-item attr dict for bool, scalar, datetime, and None branches, avoiding duplicate name normalization work.
  • Keep TypeError behavior for unsupported list item types while routing supported ones through the faster helpers.
json2xml/dicttoxml.py
Make XML-related and HTTP-related imports lazy to improve import-time performance and avoid unnecessary optional dependencies.
  • Replace the top-level defusedxml.minidom import in dicttoxml.py with a lazy_modules declaration and move parseString imports inside key_is_valid_xml and pretty-printing in Json2xml.to_xml.
  • In json2xml/utils.py, introduce a lazy_modules list for urllib3 and replace eager DEFAULT_URL_TIMEOUT/_HTTP initialization with a lazily-initialized _get_http_client helper.
  • Update readfromurl to obtain urllib3, the HTTP client, and timeout from _get_http_client, preserving behavior while deferring urllib3 import until URL use.
  • Adjust JSON2XML pretty-printing path to import defusedxml.minidom.parseString and pyexpat.ExpatError only when pretty output is requested.
json2xml/dicttoxml.py
json2xml/utils.py
json2xml/json2xml.py
Tighten fast-backend selection semantics and preserve legacy behavior for root scalar payloads, with new tests and documentation updates.
  • Extend dicttoxml_fast.dicttoxml to force Python fallback when the input object is not a dict or list, ensuring root scalars bypass the Rust backend.
  • Add a test that verifies the fast wrapper falls back to Python for root scalars and preserves the legacy <item type="int">0</item> structure under the custom root.
  • Add a test that Json2xml uses the fast backend selector (dicttoxml_fast.dicttoxml) and passes through the expected keyword arguments.
  • Update architecture.md, behavior.md, and tests.md to describe the fast-backend selector as the primary serializer entrypoint, document the lazy pretty-print import behavior, and note the root-scalar Python fallback requirement.
json2xml/dicttoxml_fast.py
tests/test_dicttoxml_fast_fallback.py
tests/test_json2xml.py
lat.md/architecture.md
lat.md/behavior.md
lat.md/tests.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (f366781) to head (5f3f6d5).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master      #303      +/-   ##
===========================================
+ Coverage   98.56%   100.00%   +1.43%     
===========================================
  Files           6         6              
  Lines         487       552      +65     
===========================================
+ Hits          480       552      +72     
+ Misses          7         0       -7     
Flag Coverage Δ
unittests 100.00% <100.00%> (+1.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In make_attrstring, the single-attribute fast path skips escape_xml for type values, which is a behavior and safety change from the previous implementation; consider still escaping val (or restricting the optimization to known-safe internal type strings) so user-supplied type attributes cannot inject unescaped XML.
  • The lazy initialization in _get_http_client mutates the module-level _HTTP and DEFAULT_URL_TIMEOUT without synchronization; if this module can be imported and used from multiple threads, consider adding a simple lock or using an initialization-once pattern to avoid potential race conditions.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `make_attrstring`, the single-attribute fast path skips `escape_xml` for `type` values, which is a behavior and safety change from the previous implementation; consider still escaping `val` (or restricting the optimization to known-safe internal type strings) so user-supplied `type` attributes cannot inject unescaped XML.
- The lazy initialization in `_get_http_client` mutates the module-level `_HTTP` and `DEFAULT_URL_TIMEOUT` without synchronization; if this module can be imported and used from multiple threads, consider adding a simple lock or using an initialization-once pattern to avoid potential race conditions.

## Individual Comments

### Comment 1
<location path="json2xml/dicttoxml.py" line_range="140-142" />
<code_context>
     """
+    if not attr:
+        return ""
+    if len(attr) == 1:
+        key, val = next(iter(attr.items()))
+        if key == "type":
+            return f' type="{val}"'
+        return f' {key}="{escape_xml(val)}"'
</code_context>
<issue_to_address>
**🚨 issue (security):** Avoid skipping XML escaping for the special-cased `type` attribute

In the `len(attr) == 1` path, `type` is interpolated with `f' type="{val}"'` instead of using `escape_xml`. Since this function accepts arbitrary `attr` dicts, callers could pass `type` values containing `&`, `<`, or quotes, leading to malformed XML or injection issues. Please escape `type` here as well (e.g., `escape_xml(val)` or by reusing the general `make_attrstring` logic) so the fast path doesn’t change escaping behavior.
</issue_to_address>

### Comment 2
<location path="tests/test_dicttoxml_fast_fallback.py" line_range="83" />
<code_context>
 def test_fast_wrapper_falls_back_to_python_for_special_keys(
</code_context>
<issue_to_address>
**suggestion (testing):** Add focused unit tests for new XML helper behavior (valid-name helpers, escape, attrs, type detection)

This test covers backend selection logic well, but the PR also changes several low‑level XML helpers that currently lack direct tests:

* `get_xml_type` now uses direct `type` checks and treats `bool` specially.
* `escape_xml` adds a fast path when no escapable characters are present.
* `make_attrstring` has new behavior for `{}` and single `{"type": ...}` attrs.
* New `*_valid_name` helpers are now on the hot path for `convert_dict`/`convert_list`.
* `key_is_valid_xml` now has a fast ASCII path and `lru_cache`.
* `is_primitive_type` changed what it considers primitive.

Please add a focused unit test module (e.g. `tests/test_dicttoxml_unit.py`) that:

* Parametrizes `get_xml_type` / `is_primitive_type` over `None`, `bool`, `int`, `float`, `Decimal`, `Fraction`, custom numerics to lock in type strings and primitive classification.
* Verifies `escape_xml` behaves identically for strings with and without escapable characters.
* Asserts `make_attrstring` output for `{}`, `{"a": 1}`, and `{ "type": "str", "id": 1 }` to pin spacing and ordering.
* Exercises the `*_valid_name` helpers, checking they set `type` correctly and don’t mutate shared `attr` dicts.
* Covers `key_is_valid_xml` fast path (`foo`, `_bar-1`) vs parse-based path (non‑ASCII/edge cases), ensuring caching doesn’t change behavior.

This will guard the refactors in `dicttoxml.py` and ensure performance optimizations don’t change serialized XML output.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread json2xml/dicttoxml.py
Comment thread tests/test_dicttoxml_fast_fallback.py
Comment thread json2xml/json2xml.py Fixed
@vinitkumar vinitkumar merged commit 32e7b87 into master May 27, 2026
62 checks passed
@vinitkumar vinitkumar deleted the perfmaxx-json2xml branch May 27, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants