Skip to content

AbstractDataTreeNode.indexOfChild() is called excessively on large resource tree size #2567

@laeubi

Description

@laeubi

During analysis of two heapdumps shared here the following issue was discovered as a hotspot that can benefit from optimization:

Performance Data

Metric WITH transitive WITHOUT transitive Ratio
indexOfChild() self time (µs) 70,405,568 35,739,594 2.0×
DeltaDataTree.lookup() (µs) 55,328,734 24,397,133 2.3×

Description

indexOfChild() performs a binary search on a sorted array of child nodes:

protected int indexOfChild(String localName) {
    AbstractDataTreeNode[] nodes = this.children;
    int left = 0, right = nodes.length - 1;
    while (left <= right) {
        int mid = (left + right) / 2;
        int compare = localName.compareTo(nodes[mid].name);
        // ...
    }
    return -1;
}

While the binary search itself is O(log n), DeltaDataTree.lookup() calls indexOfChild() at each path segment level:

public DataTreeLookup lookup(IPath key) {
    int keyLength = key.segmentCount();
    for (DeltaDataTree tree = this; tree != null; tree = tree.parent) {
        AbstractDataTreeNode node = tree.rootNode;
        for (int i = 0; i < keyLength; i++) {
            node = node.childAtOrNull(key.segment(i)); // calls indexOfChild
        }
    }
}

The delta tree chain (tree.parent) can be deep, and each delta layer requires re-traversing the path from root.

The combined indexOfChild() time (70.4 seconds) represents the single largest CPU consumer in the entire build.

Suggested Fix

  1. Compact delta chains more aggressively: The parent chain walk is expensive. More frequent immobilization of the delta tree would reduce the number of layers to traverse.
  2. Cache frequent lookups: If the same resource paths are looked up repeatedly (e.g., project roots, source folders), a small LRU cache on DeltaDataTree.lookup() could avoid repeated traversals.
  3. Consider hybrid data structure: For nodes with many children (> 100), consider switching from sorted array + binary search to a HashMap<String, AbstractDataTreeNode> for O(1) child lookup. This is particularly relevant for directories containing many files (e.g., classpath container folders).
  4. Reduce key.segment(i) overhead: Each call to IPath.segment(i) on Path objects does array access. Ensure paths are not being reconstructed or re-parsed in hot loops.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions