Skip to content

Change namespace cache strategy#333

Open
tompng wants to merge 1 commit into
ruby:masterfrom
tompng:xpath_namespace_lookup_cache
Open

Change namespace cache strategy#333
tompng wants to merge 1 commit into
ruby:masterfrom
tompng:xpath_namespace_lookup_cache

Conversation

@tompng

@tompng tompng commented Jun 14, 2026

Copy link
Copy Markdown
Member

Even if namespace lookup is cached, namespace lookup was slow for deeply nested XML nodes.
Namespaces are cached in document, but document lookup from a deep node costs O(n).

We can't cache it in a document or root of the node, because we also need to cache document lookup result.
It's not good to cache to node itself, for maintainability reason, cache invalidation, etc.
In this PR, Hash for caching node's namespace/namespaces, and cached namespace attributes defined in attrlist decl can be injected through method arguments.

Benchmark

irb> doc = REXML::Document.new('<a attr="1">'*1000+'</a>'*1000)
irb> measure
irb> REXML::XPath.match(doc, '//a');
# Master: 0.2791398s
# Master(XPathParser#sort disabled): 0.1138055s
# This PR: 0.1743505s
# This PR(XPathParser#sort disabled): 0.0046388s

Time consumption was: Scan: 0.0046s (1.8%), Sort: 0.17s (60%), Namespace lookup: 0.11s (38%). This PR removes the 38% cost.

Copilot AI review requested due to automatic review settings June 14, 2026 17:45

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR refactors REXML namespace resolution to incorporate ATTLIST-declared xmlns values (and adds caching in the XPath parser), along with a new regression test that validates namespace precedence.

Changes:

  • Add support for applying ATTLIST-declared xmlns values into element/attribute namespace resolution.
  • Introduce per-parser caching for element namespace computations in XPathParser.
  • Add a new test for ATTLIST vs element-declared namespace precedence; remove the previous namespaces cache regression test.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test/xpath/test_base.rb Removes a regression test that previously exercised namespace cache invalidation behavior.
test/test_core.rb Adds a test asserting ATTLIST-declared xmlns precedence vs inherited/locally-declared namespaces.
lib/rexml/xpath_parser.rb Adds namespace caching and routes namespace comparisons through cached lookup helpers.
lib/rexml/element.rb Refactors namespace calculation/lookup internals and integrates ATTLIST-declared namespaces.
lib/rexml/document.rb Adds extraction of ATTLIST-declared xmlns mappings per element name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/rexml/xpath_parser.rb Outdated
Comment on lines +66 to +68
@attrlist_per_element_namespaces = nil
@document = nil
@element_namespaces_cache = {}
Comment thread lib/rexml/xpath_parser.rb
Comment on lines +91 to +92
@document = node.document
match( path_stack, node )
Comment thread lib/rexml/xpath_parser.rb
Comment on lines +195 to +198
def element_namespaces(element)
@attrlist_per_element_namespaces ||= @document&.send(:attrlist_per_element_namespaces) || {}
element.send(:calculate_namespaces, @element_namespaces_cache, @attrlist_per_element_namespaces)
end
Comment thread lib/rexml/xpath_parser.rb
Comment on lines +183 to +185
def attribute_namespace(attribute)
attribute.prefix == '' ? '' : element_namespace_lookup(attribute.element, attribute.prefix)
end
Comment thread lib/rexml/element.rb Outdated
def calculate_namespaces(cache_hash = nil, attrlist_element_namespaces = nil)
return cache_hash[self] if cache_hash && cache_hash.key?(self)

inherited_namespaces = parent ? parent.send(:calculate_namespaces, cache_hash, attrlist_element_namespaces) : {}
Comment thread lib/rexml/element.rb
Comment on lines +1525 to +1530
# Inherited namespaces can be overridden by attribute list declaration, and both can be overridden by its own attributes
namespaces = inherited_namespaces
namespaces = namespaces.merge(attrlist_namespaces) if attrlist_namespaces
namespaces = namespaces.merge(own_namespaces) if own_namespaces.any?
cache_hash[self] = namespaces if cache_hash
namespaces
Comment thread test/test_core.rb Outdated
assert_equal correct, prefixes
end

def test_attrlist_namespace_priority
Comment thread lib/rexml/xpath_parser.rb
Comment on lines +195 to +198
def element_namespaces(element)
@attrlist_per_element_namespaces ||= @document&.send(:attrlist_per_element_namespaces) || {}
element.send(:calculate_namespaces, @element_namespaces_cache, @attrlist_per_element_namespaces)
end
Namespace calculation for each node cant' be cached to document because document lookup is slow for deeply nested nodes.
Change namespace cache strategy, inject cached hash as an argument to retrieve namespace/namespaces from XPath match operation.
@tompng tompng force-pushed the xpath_namespace_lookup_cache branch from 8d141ec to 8a7840a Compare June 14, 2026 17:58
Comment thread lib/rexml/element.rb
@@ -2424,19 +2447,12 @@ def prefixes
# d.root.attributes.namespaces # => {"xmlns"=>"foo", "x"=>"bar", "y"=>"twee"}
#
def namespaces

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is no longer called except from the added test.
We need to keep this if it's a public API, but if not, we can remove it (perhaps with deprecation first)

Comment thread lib/rexml/element.rb
Comment on lines +1527 to +1528
namespaces = namespaces.merge(attlist_namespaces) if attlist_namespaces
namespaces = namespaces.merge(own_namespaces) if own_namespaces.any?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a small bug in master: precedence of attlist attributes and own attirbutes was reversed. Test is added for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants