Perfmaxx json2xml#303
Merged
Merged
Conversation
Contributor
Reviewer's GuideOptimize JSON-to-XML conversion performance and startup by adding lazy imports, cached XML name/type handling, and fast-path helpers, while tightening the Rust fast-backend selection semantics and updating behavior docs/tests accordingly. Sequence diagram for readfromurl lazy HTTP client initializationsequenceDiagram
participant Caller
participant Utils as utils.readfromurl
participant HTTPInit as utils._get_http_client
participant Urllib3 as urllib3
Caller->>Utils: readfromurl(url, params)
Utils->>HTTPInit: _get_http_client()
alt first call
HTTPInit->>Urllib3: import urllib3
HTTPInit->>Urllib3: Timeout(connect, read)
HTTPInit->>Urllib3: PoolManager()
end
HTTPInit-->>Utils: (urllib3, http, timeout)
Utils->>Urllib3: http.request("GET", url, fields=params, timeout=timeout, retries=False)
Urllib3-->>Utils: response
Utils-->>Caller: parsed JSON or URLReadError
Flow diagram for fast backend selection in dicttoxml_fast.dicttoxmlflowchart TD
A[dicttoxml_fast.dicttoxml called
with obj and options] --> B{Any unsupported
feature enabled?}
B -->|ids or item_func or
xml_namespaces or xpath_format| D[Use Python
dicttoxml]
B -->|obj is not dict or list| D
B -->|no unsupported
features and obj is
dict or list| C[Use Rust
backend]
C --> E[Return XML bytes]
D --> E
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #303 +/- ##
===========================================
+ Coverage 98.56% 100.00% +1.43%
===========================================
Files 6 6
Lines 487 552 +65
===========================================
+ Hits 480 552 +72
+ Misses 7 0 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Contributor
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
make_attrstring, the single-attribute fast path skipsescape_xmlfortypevalues, which is a behavior and safety change from the previous implementation; consider still escapingval(or restricting the optimization to known-safe internal type strings) so user-suppliedtypeattributes cannot inject unescaped XML. - The lazy initialization in
_get_http_clientmutates the module-level_HTTPandDEFAULT_URL_TIMEOUTwithout synchronization; if this module can be imported and used from multiple threads, consider adding a simple lock or using an initialization-once pattern to avoid potential race conditions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `make_attrstring`, the single-attribute fast path skips `escape_xml` for `type` values, which is a behavior and safety change from the previous implementation; consider still escaping `val` (or restricting the optimization to known-safe internal type strings) so user-supplied `type` attributes cannot inject unescaped XML.
- The lazy initialization in `_get_http_client` mutates the module-level `_HTTP` and `DEFAULT_URL_TIMEOUT` without synchronization; if this module can be imported and used from multiple threads, consider adding a simple lock or using an initialization-once pattern to avoid potential race conditions.
## Individual Comments
### Comment 1
<location path="json2xml/dicttoxml.py" line_range="140-142" />
<code_context>
"""
+ if not attr:
+ return ""
+ if len(attr) == 1:
+ key, val = next(iter(attr.items()))
+ if key == "type":
+ return f' type="{val}"'
+ return f' {key}="{escape_xml(val)}"'
</code_context>
<issue_to_address>
**🚨 issue (security):** Avoid skipping XML escaping for the special-cased `type` attribute
In the `len(attr) == 1` path, `type` is interpolated with `f' type="{val}"'` instead of using `escape_xml`. Since this function accepts arbitrary `attr` dicts, callers could pass `type` values containing `&`, `<`, or quotes, leading to malformed XML or injection issues. Please escape `type` here as well (e.g., `escape_xml(val)` or by reusing the general `make_attrstring` logic) so the fast path doesn’t change escaping behavior.
</issue_to_address>
### Comment 2
<location path="tests/test_dicttoxml_fast_fallback.py" line_range="83" />
<code_context>
def test_fast_wrapper_falls_back_to_python_for_special_keys(
</code_context>
<issue_to_address>
**suggestion (testing):** Add focused unit tests for new XML helper behavior (valid-name helpers, escape, attrs, type detection)
This test covers backend selection logic well, but the PR also changes several low‑level XML helpers that currently lack direct tests:
* `get_xml_type` now uses direct `type` checks and treats `bool` specially.
* `escape_xml` adds a fast path when no escapable characters are present.
* `make_attrstring` has new behavior for `{}` and single `{"type": ...}` attrs.
* New `*_valid_name` helpers are now on the hot path for `convert_dict`/`convert_list`.
* `key_is_valid_xml` now has a fast ASCII path and `lru_cache`.
* `is_primitive_type` changed what it considers primitive.
Please add a focused unit test module (e.g. `tests/test_dicttoxml_unit.py`) that:
* Parametrizes `get_xml_type` / `is_primitive_type` over `None`, `bool`, `int`, `float`, `Decimal`, `Fraction`, custom numerics to lock in type strings and primitive classification.
* Verifies `escape_xml` behaves identically for strings with and without escapable characters.
* Asserts `make_attrstring` output for `{}`, `{"a": 1}`, and `{ "type": "str", "id": 1 }` to pin spacing and ordering.
* Exercises the `*_valid_name` helpers, checking they set `type` correctly and don’t mutate shared `attr` dicts.
* Covers `key_is_valid_xml` fast path (`foo`, `_bar-1`) vs parse-based path (non‑ASCII/edge cases), ensuring caching doesn’t change behavior.
This will guard the refactors in `dicttoxml.py` and ensure performance optimizations don’t change serialized XML output.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary by Sourcery
Optimize JSON-to-XML conversion performance and backend selection while preserving existing output semantics and improving lazy imports.
New Features:
Bug Fixes:
<item>wrapping semantics.Enhancements:
Tests: