Change namespace cache strategy#333
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR refactors REXML namespace resolution to incorporate ATTLIST-declared xmlns values (and adds caching in the XPath parser), along with a new regression test that validates namespace precedence.
Changes:
- Add support for applying ATTLIST-declared xmlns values into element/attribute namespace resolution.
- Introduce per-parser caching for element namespace computations in
XPathParser. - Add a new test for ATTLIST vs element-declared namespace precedence; remove the previous namespaces cache regression test.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/xpath/test_base.rb | Removes a regression test that previously exercised namespace cache invalidation behavior. |
| test/test_core.rb | Adds a test asserting ATTLIST-declared xmlns precedence vs inherited/locally-declared namespaces. |
| lib/rexml/xpath_parser.rb | Adds namespace caching and routes namespace comparisons through cached lookup helpers. |
| lib/rexml/element.rb | Refactors namespace calculation/lookup internals and integrates ATTLIST-declared namespaces. |
| lib/rexml/document.rb | Adds extraction of ATTLIST-declared xmlns mappings per element name. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @attrlist_per_element_namespaces = nil | ||
| @document = nil | ||
| @element_namespaces_cache = {} |
| @document = node.document | ||
| match( path_stack, node ) |
| def element_namespaces(element) | ||
| @attrlist_per_element_namespaces ||= @document&.send(:attrlist_per_element_namespaces) || {} | ||
| element.send(:calculate_namespaces, @element_namespaces_cache, @attrlist_per_element_namespaces) | ||
| end |
| def attribute_namespace(attribute) | ||
| attribute.prefix == '' ? '' : element_namespace_lookup(attribute.element, attribute.prefix) | ||
| end |
| def calculate_namespaces(cache_hash = nil, attrlist_element_namespaces = nil) | ||
| return cache_hash[self] if cache_hash && cache_hash.key?(self) | ||
|
|
||
| inherited_namespaces = parent ? parent.send(:calculate_namespaces, cache_hash, attrlist_element_namespaces) : {} |
| # Inherited namespaces can be overridden by attribute list declaration, and both can be overridden by its own attributes | ||
| namespaces = inherited_namespaces | ||
| namespaces = namespaces.merge(attrlist_namespaces) if attrlist_namespaces | ||
| namespaces = namespaces.merge(own_namespaces) if own_namespaces.any? | ||
| cache_hash[self] = namespaces if cache_hash | ||
| namespaces |
| assert_equal correct, prefixes | ||
| end | ||
|
|
||
| def test_attrlist_namespace_priority |
| def element_namespaces(element) | ||
| @attrlist_per_element_namespaces ||= @document&.send(:attrlist_per_element_namespaces) || {} | ||
| element.send(:calculate_namespaces, @element_namespaces_cache, @attrlist_per_element_namespaces) | ||
| end |
Namespace calculation for each node cant' be cached to document because document lookup is slow for deeply nested nodes. Change namespace cache strategy, inject cached hash as an argument to retrieve namespace/namespaces from XPath match operation.
8d141ec to
8a7840a
Compare
| @@ -2424,19 +2447,12 @@ def prefixes | |||
| # d.root.attributes.namespaces # => {"xmlns"=>"foo", "x"=>"bar", "y"=>"twee"} | |||
| # | |||
| def namespaces | |||
There was a problem hiding this comment.
This method is no longer called except from the added test.
We need to keep this if it's a public API, but if not, we can remove it (perhaps with deprecation first)
| namespaces = namespaces.merge(attlist_namespaces) if attlist_namespaces | ||
| namespaces = namespaces.merge(own_namespaces) if own_namespaces.any? |
There was a problem hiding this comment.
There was a small bug in master: precedence of attlist attributes and own attirbutes was reversed. Test is added for this
Even if namespace lookup is cached, namespace lookup was slow for deeply nested XML nodes.
Namespaces are cached in
document, butdocumentlookup from a deep node costsO(n).We can't cache it in a document or root of the node, because we also need to cache document lookup result.
It's not good to cache to node itself, for maintainability reason, cache invalidation, etc.
In this PR, Hash for caching node's namespace/namespaces, and cached namespace attributes defined in attrlist decl can be injected through method arguments.
Benchmark
Time consumption was: Scan: 0.0046s (1.8%), Sort: 0.17s (60%), Namespace lookup: 0.11s (38%). This PR removes the 38% cost.