Skip to content

Commit 132391f

Browse files
authored
Mark freed addresses for gc/reuse (#2)
* Add auto-removal interface for obsolete addresses Add three new methods to IStorage interface (all with default no-op): - delete(Collection<Address>) - batch delete addresses from storage - markFreed(Address) - mark an address as obsolete during modifications - deleteFreed() - delete all marked addresses at end of batch operation This enables storage implementations to track and reclaim storage space during large batch operations, preventing garbage accumulation. Inspired by tonsky#14 (CLJS durability PR). * Mark obsolete addresses during tree modifications Call storage.markFreed() when nodes are replaced during cons, disjoin, and replace operations. This allows storage implementations to track obsolete addresses for later deletion, preventing garbage accumulation during batch operations. - Branch.java: Mark child addresses when replaced in add/remove/replace - PersistentSortedSet.java: Mark root address when tree structure changes - Tests: Add automatic marking tests for conj and disj operations * Fix auto-removal in editable (transient) mode Two bugs were causing addresses to not be marked as freed in transient mode: 1. add() editable case: No markFreed call at all when replacing a child 2. replace() editable case: markFreed was called AFTER child() which already nullified the address, so the check always failed Both fixes: Mark the old address BEFORE calling child(idx, node) since child() internally sets _addresses[idx] = null. This fixes "Node not found" errors during batch indexing where addresses were incorrectly deleted because they weren't marked as freed during transient tree modifications. * Track freed addresses for optional immediate gc. * Fix markFreed to work in both persistent and transient modes This completes the auto-removal implementation for both CLJ and CLJS: Java (PersistentSortedSet.java): - Move markFreed calls BEFORE editable checks in cons/disjoin/replace - This enables marking in both persistent and transient modes - Pattern: check storage and address exist, then call markFreed CLJS (btset.cljs): - Add markFreed calls to $conjoin, $disjoin, $replace - Mirror Java implementation pattern - Add markFreed to IStorage protocol (storage.cljs) Tests: - Fix auto_removal.clj: convert delete/deleteFreed from defrecord methods to standalone functions (Java interfaces don't allow extras) - Add auto_removal.cljs with matching test coverage - Add debug output to both CLJ and CLJS tests - All 8 core tests pass in CLJ (26 assertions) Test infrastructure: - Add generative.cljc: cross-platform generative tests - Add structural_invariants.cljc: tree property verification - Add ref_stress.clj: SoftReference/WeakReference eviction tests Ignore .cljs_node_repl/ build artifacts * Add test.check. * Factor out conjAll tests, unify marking protocols. * Factor out ref test. * Complete cljs mark free implementation. * Add missing protocols. * Remove debug instrumentation. * Update README.
1 parent 8164843 commit 132391f

15 files changed

Lines changed: 1351 additions & 79 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ node_modules
1515
.calva/
1616
.clj-kondo/
1717
.lsp/
18+
.cljs_node_repl/
1819

1920
# Internal docs and scratch work
2021
.internal/

README.md

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,11 @@ PersistentSortedSet supports:
55
- transients,
66
- custom comparators,
77
- fast iteration,
8-
- efficient slices (iterator over a part of the set)
9-
- efficient `rseq` on slices.
8+
- efficient slices (iterator over a part of the set),
9+
- efficient `rseq` on slices,
10+
- `lookup` to retrieve actual stored keys,
11+
- `replace` for single-traversal key updates at same logical position,
12+
- durable storage with automatic garbage collection via `markFreed`.
1013

1114
Almost a drop-in replacement for `clojure.core/sorted-set`, the only difference being this one can’t store `nil`.
1215

@@ -19,10 +22,6 @@ export JAVA8_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/H
1922
lein jar
2023
```
2124

22-
## Support us
23-
24-
<a href="https://www.patreon.com/bePatron?u=4230547"><img src="./extras/become_a_patron_button@2x.png" alt="Become a Patron!" width="217" height="51"></a>
25-
2625
## Usage
2726

2827
Dependency:
@@ -139,13 +138,31 @@ To do that, implement `IStorage` interface:
139138
:addresses (.addresses ^Branch node)}
140139
(.keys ^Leaf node))))
141140
address))
142-
141+
143142
(restore [_ address]
144143
(let [value (-> (get @*storage address)
145144
(edn/read-string))]
146145
(if (map? value)
147146
(Branch. (int (:level value)) ^java.util.List (:keys value) ^java.util.List (:addresses value))
148-
(Leaf. ^java.util.List value)))))
147+
(Leaf. ^java.util.List value))))
148+
149+
(markFreed [_ address]
150+
;; Optional: track addresses that become obsolete during modifications.
151+
;; Called automatically when tree nodes are replaced during conj/disj.
152+
;; Enables garbage collection of unreachable nodes.
153+
nil)
154+
155+
(accessed [_ address]
156+
;; Optional: track node access for cache management (e.g., LRU).
157+
nil)
158+
159+
(isFreed [_ address]
160+
;; Optional: check if address has been marked as freed.
161+
false)
162+
163+
(freedInfo [_ address]
164+
;; Optional: return debug information about freed addresses.
165+
nil))
149166
```
150167

151168
Storing Persistent Sorted Set works per node. This will save each node once:
@@ -225,7 +242,9 @@ Last piece of the puzzle: `set/walk-addresses`. Use it to check which nodes are
225242

226243
See [test_storage.clj](test-clojure/me/tonsky/persistent_sorted_set/test_storage.clj) for more examples.
227244

228-
Durability for ClojureScript is not yet supported.
245+
### ClojureScript Durability
246+
247+
ClojureScript also supports durable storage with async operations. The `IStorage` interface works the same way, but `store` and `restore` methods return promises/async values instead of direct values. This allows integration with IndexedDB, remote storage APIs, and other async storage backends.
229248

230249
## Performance
231250

@@ -299,5 +318,6 @@ PersistentSortedSet (transient) 47..50ms
299318
## License
300319

301320
Copyright © 2019 Nikita Prokopov
321+
Copyright © 2024 Christian Weilbach
302322

303323
Licensed under MIT (see [LICENSE](LICENSE)).

deps.edn

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{:paths ["src-clojure" "target/classes"]
22
:deps
33
{org.clojure/clojure {:mvn/version "1.12.0"}
4-
is.simm/partial-cps {:mvn/version "0.1.50"}}
4+
is.simm/partial-cps {:mvn/version "0.1.51"}}
55
:deps/prep-lib {:ensure "target/classes"
66
:alias :build
77
:fn java}
@@ -17,11 +17,13 @@
1717

1818
:node-tests
1919
{:extra-paths ["test-clojure"]
20-
:extra-deps {thheller/shadow-cljs {:mvn/version "3.2.0"}}}
20+
:extra-deps {thheller/shadow-cljs {:mvn/version "3.2.0"}
21+
org.clojure/test.check {:mvn/version "1.1.1"}}}
2122

2223
:node-stress
2324
{:extra-paths ["test-clojure"]
24-
:extra-deps {thheller/shadow-cljs {:mvn/version "3.2.0"}}}
25+
:extra-deps {thheller/shadow-cljs {:mvn/version "3.2.0"}
26+
org.clojure/test.check {:mvn/version "1.1.1"}}}
2527

2628
:build
2729
{:extra-deps {io.github.clojure/tools.build {:git/tag "v0.8.5" :git/sha "9c738da" #_#_:exclusions [org.slf4j/slf4j-nop]}
@@ -31,7 +33,8 @@
3133

3234
:test
3335
{:extra-paths ["test-clojure"]
34-
:extra-deps {io.github.cognitect-labs/test-runner {:git/tag "v0.5.1" :git/sha "dfb30dd"}}
36+
:extra-deps {io.github.cognitect-labs/test-runner {:git/tag "v0.5.1" :git/sha "dfb30dd"}
37+
org.clojure/test.check {:mvn/version "1.1.1"}}
3538
:main-opts ["-m" "cognitect.test-runner"]
3639
:exec-fn cognitect.test-runner.api/test
3740
:exec-args {:dirs ["test-clojure"]
@@ -56,4 +59,4 @@
5659

5760
:ffix
5861
{:extra-deps {cljfmt/cljfmt {:mvn/version "0.9.2"}}
59-
:main-opts ["-m" "cljfmt.main" "fix" "src-clojure" "test-clojure"]}}}
62+
:main-opts ["-m" "cljfmt.main" "fix" "src-clojure" "test-clojure"]}}}

shadow-cljs.edn

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
{:node-tests
55
{:target :node-test ;shadow-cljs release node-tests && node target/pss/tests.min.js
66
:output-to "target/pss/tests.min.js"
7-
:ns-regexp "^(?!me.tonsky.persistent-sorted-set.test.stress)"
7+
:ns-regexp "^(?!me.tonsky.persistent-sorted-set.test.stress|me.tonsky.persistent-sorted-set.test.structural-invariants|me.tonsky.persistent-sorted-set.test.generative)"
88
:compiler-options {:infer-externs true
99
:parallel-build true
1010
:static-fns true
@@ -24,7 +24,7 @@
2424
:ci ;shadow-cljs compile ci && node target/pss/ci-tests.js
2525
{:target :node-test
2626
:output-to "target/pss/ci-tests.js"
27-
:ns-regexp "^(?!me.tonsky.persistent-sorted-set.test.stress|me.tonsky.persistent-sorted-set.test.storage)"
27+
:ns-regexp "^(?!me.tonsky.persistent-sorted-set.test.stress|me.tonsky.persistent-sorted-set.test.storage|me.tonsky.persistent-sorted-set.test.structural-invariants|me.tonsky.persistent-sorted-set.test.generative)"
2828
:compiler-options {:infer-externs true
2929
:parallel-build false
3030
:static-fns true

src-clojure/me/tonsky/persistent_sorted_set/branch.cljs

Lines changed: 45 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -118,13 +118,24 @@
118118
addrs
119119
(let [na (arrays/make-array (arrays/alength addrs))]
120120
(arrays/acopy addrs 0 (arrays/alength addrs) na 0)
121+
;; Mark old child address as freed before clearing
122+
(when (and storage (aget addrs idx))
123+
(storage/markFreed storage (aget addrs idx)))
121124
(aset na idx nil)
122125
na)))
123-
(util/splice addrs idx (inc idx) (arrays/array nil nil))))]
126+
(let [old-addr (aget addrs idx)]
127+
;; Mark old child address as freed before clearing
128+
(when (and storage old-addr)
129+
(storage/markFreed storage old-addr))
130+
(util/splice addrs idx (inc idx) (arrays/array nil nil)))))]
124131
(arrays/array (Branch. (.-level this) new-keys new-children new-addrs (.-settings this))))
125132
(let [middle (arrays/half (arrays/alength new-children))
126133
tmp-addrs (when addrs
127-
(util/splice addrs idx (inc idx) (arrays/array nil nil)))
134+
(let [old-addr (aget addrs idx)]
135+
;; Mark old child address as freed before clearing
136+
(when (and storage old-addr)
137+
(storage/markFreed storage old-addr))
138+
(util/splice addrs idx (inc idx) (arrays/array nil nil))))
128139
left-addrs (when tmp-addrs (.slice tmp-addrs 0 middle))
129140
right-addrs (when tmp-addrs (.slice tmp-addrs middle))]
130141
(arrays/array
@@ -166,12 +177,23 @@
166177
(let [alen (arrays/alength disjoined)
167178
repl (arrays/make-array alen)
168179
laddr (when left-child (arrays/aget addrs left-idx))
169-
raddr (when right-child (arrays/aget addrs (dec right-idx)))]
170-
(when (and left-child (> alen 1)
171-
(identical? (arrays/aget disjoined 0) left-child))
180+
raddr (when right-child (arrays/aget addrs (dec right-idx)))
181+
left-unchanged (and left-child (> alen 1)
182+
(identical? (arrays/aget disjoined 0) left-child))
183+
right-unchanged (and right-child (> alen 1)
184+
(identical? (arrays/aget disjoined (dec alen)) right-child))]
185+
;; Mark freed addresses before clearing
186+
(when storage
187+
(dotimes [i (- right-idx left-idx)]
188+
(let [addr-idx (+ left-idx i)
189+
old-addr (arrays/aget addrs addr-idx)]
190+
(when (and old-addr
191+
(not (and (= addr-idx left-idx) left-unchanged))
192+
(not (and (= addr-idx (dec right-idx)) right-unchanged)))
193+
(storage/markFreed storage old-addr)))))
194+
(when left-unchanged
172195
(aset repl 0 laddr))
173-
(when (and right-child (> alen 1)
174-
(identical? (arrays/aget disjoined (dec alen)) right-child))
196+
(when right-unchanged
175197
(aset repl (dec alen) raddr))
176198
(util/splice addrs left-idx right-idx repl)))]
177199
(util/rotate (Branch. (.-level this) new-keys new-kids new-addrs (.-settings this))
@@ -218,13 +240,20 @@
218240
(do
219241
(aset keys idx new-max-key)
220242
(aset children idx new-node)
221-
(when addrs (aset addrs idx nil))
243+
(when addrs
244+
;; Mark old child address as freed before clearing
245+
(when (and storage (aget addrs idx))
246+
(storage/markFreed storage (aget addrs idx)))
247+
(aset addrs idx nil))
222248
(arrays/array this))
223249
;; Persistent: clone arrays
224250
(let [new-keys (arrays/aclone keys)
225251
new-children (arrays/aclone children)
226252
new-addrs (when addrs
227253
(let [na (arrays/aclone addrs)]
254+
;; Mark old child address as freed before clearing
255+
(when (and storage (aget addrs idx))
256+
(storage/markFreed storage (aget addrs idx)))
228257
(aset na idx nil)
229258
na))]
230259
(aset new-keys idx new-max-key)
@@ -235,14 +264,21 @@
235264
;; Transient: mutate in place
236265
(do
237266
(aset children idx new-node)
238-
(when addrs (aset addrs idx nil))
267+
(when addrs
268+
;; Mark old child address as freed before clearing
269+
(when (and storage (aget addrs idx))
270+
(storage/markFreed storage (aget addrs idx)))
271+
(aset addrs idx nil))
239272
(if last-child?
240273
(arrays/array this) ; Last child, need to propagate
241274
:early-exit)) ; Not last child, early exit
242275
;; Persistent: clone children array
243276
(let [new-children (arrays/aclone children)
244277
new-addrs (when addrs
245278
(let [na (arrays/aclone addrs)]
279+
;; Mark old child address as freed before clearing
280+
(when (and storage (aget addrs idx))
281+
(storage/markFreed storage (aget addrs idx)))
246282
(aset na idx nil)
247283
na))]
248284
(aset new-children idx new-node)

src-clojure/me/tonsky/persistent_sorted_set/btset.cljs

Lines changed: 46 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -56,25 +56,29 @@
5656
roots (await (node/$add root (.-storage set) key cmp opts))]
5757
(if (nil? roots)
5858
set
59-
(if (== (arrays/alength roots) 1)
60-
(BTSet. (arrays/aget roots 0)
61-
(inc (.-cnt set))
62-
(.-comparator set)
63-
(.-meta set)
64-
UNINITIALIZED_HASH
65-
(.-storage set)
66-
nil
67-
(.-settings set))
68-
(let [child0 (arrays/aget roots 0)
69-
lvl (inc (node/level child0))]
70-
(BTSet. (Branch. lvl (arrays/amap node/max-key roots) roots nil (.-settings set))
59+
(do
60+
;; Mark old root address as freed if it exists
61+
(when (and (.-storage set) (.-address set))
62+
(storage/markFreed (.-storage set) (.-address set)))
63+
(if (== (arrays/alength roots) 1)
64+
(BTSet. (arrays/aget roots 0)
7165
(inc (.-cnt set))
7266
(.-comparator set)
7367
(.-meta set)
7468
UNINITIALIZED_HASH
7569
(.-storage set)
7670
nil
77-
(.-settings set))))))))))
71+
(.-settings set))
72+
(let [child0 (arrays/aget roots 0)
73+
lvl (inc (node/level child0))]
74+
(BTSet. (Branch. lvl (arrays/amap node/max-key roots) roots nil (.-settings set))
75+
(inc (.-cnt set))
76+
(.-comparator set)
77+
(.-meta set)
78+
UNINITIALIZED_HASH
79+
(.-storage set)
80+
nil
81+
(.-settings set)))))))))))
7882

7983
(defn $replace
8084
([^BTSet set old-key new-key]
@@ -90,14 +94,18 @@
9094
nodes (await (node/$replace root (.-storage set) old-key new-key cmp opts))]
9195
(if (nil? nodes)
9296
set
93-
(BTSet. (arrays/aget nodes 0)
94-
(.-cnt set)
95-
(.-comparator set)
96-
(.-meta set)
97-
UNINITIALIZED_HASH
98-
(.-storage set)
99-
nil
100-
(.-settings set))))))))
97+
(do
98+
;; Mark old root address as freed if it exists
99+
(when (and (.-storage set) (.-address set))
100+
(storage/markFreed (.-storage set) (.-address set)))
101+
(BTSet. (arrays/aget nodes 0)
102+
(.-cnt set)
103+
(.-comparator set)
104+
(.-meta set)
105+
UNINITIALIZED_HASH
106+
(.-storage set)
107+
nil
108+
(.-settings set)))))))))
101109

102110
(defn $disjoin
103111
([^BTSet set key]
@@ -113,19 +121,23 @@
113121
new-roots (await (node/$remove root (.-storage set) key nil nil cmp opts))]
114122
(if (nil? new-roots)
115123
set
116-
(let [new-root (arrays/aget new-roots 0)
117-
new-root (if (and (instance? Branch new-root)
118-
(== 1 (arrays/alength (.-children new-root))))
119-
(await (branch/$child new-root (.-storage set) 0 opts))
120-
new-root)]
121-
(BTSet. new-root
122-
(dec (.-cnt set))
123-
(.-comparator set)
124-
(.-meta set)
125-
UNINITIALIZED_HASH
126-
(.-storage set)
127-
nil
128-
(.-settings set)))))))))
124+
(do
125+
;; Mark old root address as freed if it exists
126+
(when (and (.-storage set) (.-address set))
127+
(storage/markFreed (.-storage set) (.-address set)))
128+
(let [new-root (arrays/aget new-roots 0)
129+
new-root (if (and (instance? Branch new-root)
130+
(== 1 (arrays/alength (.-children new-root))))
131+
(await (branch/$child new-root (.-storage set) 0 opts))
132+
new-root)]
133+
(BTSet. new-root
134+
(dec (.-cnt set))
135+
(.-comparator set)
136+
(.-meta set)
137+
UNINITIALIZED_HASH
138+
(.-storage set)
139+
nil
140+
(.-settings set))))))))))
129141

130142
(defn $store
131143
([^BTSet set arg]

src-clojure/me/tonsky/persistent_sorted_set/impl/storage.cljs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,7 @@
44
(store [this node opts])
55
(restore [this address opts])
66
(accessed [this address])
7-
(delete [this addresses]))
7+
(delete [this addresses])
8+
(markFreed [this address])
9+
(isFreed [this address])
10+
(freedInfo [this address]))

0 commit comments

Comments
 (0)