feat: Splay Tree Formalisation#568
Conversation
|
Link to discussion thread: https://leanprover.zulipchat.com/#narrow/channel/513188-CSLib/topic/Splay.20tree.20PR/with/595534315 |
|
|
||
| variable {α : Type} | ||
|
|
||
| inductive BinaryTree (α : Type) where |
There was a problem hiding this comment.
It seems this is the same as Mathlib.Data.Tree.Basic?
| /-! ### BST Structure -/ | ||
| section BSTStructure | ||
|
|
||
| structure BST (α : Type) [LinearOrder α] where |
There was a problem hiding this comment.
I'm not convinced by the bundling here.
There was a problem hiding this comment.
Can you elaborate? What is bad? What should be a better choice?
There was a problem hiding this comment.
Chris is right. This is an unnecessary bundling. You can operate solely on BinaryTree and insert IsBST as a hypothesis in those theorems that need it.
There was a problem hiding this comment.
Most of the proofs in the PR were based on a separate tree and IsBST property.
We create an API specifically for people who can use BST. In some use cases, it is more convenient to refer to a BST as a type rather than as a single tree with its properties. Sometimes you don't want to carry these properties around all the time.
There was a problem hiding this comment.
Please do ask on Zulip if you'd like to hear other opinions, but unbundling these sort of propositions is a well-established best practice. I think this should be a Prop on Mathlib's (nearly identical) definition of trees.
There was a problem hiding this comment.
Most of the proofs in the PR were based on a separate tree and IsBST property.
We create an API specifically for people who can use BST. In some use cases, it is more convenient to refer to a BST as a type rather than as a single tree with its properties. Sometimes you don't want to carry these properties around all the time.
Consider the fact that a library PR must be built with future re-use in mind, beyond just this PR. If you create a tree definition it can be reused in several places where it need not be a bst. Secondly, this means you can actually work with trees from mathlib.
There was a problem hiding this comment.
These files need to be modules.
Shreyas4991
left a comment
There was a problem hiding this comment.
Summary : When writing functional data structures, it is good practice to provide standard functional data structure API such as maps and folds, and then API lemmas over them. Also you are restating an induction principle of BinaryTree. This is redundant.
| /-! ### Tree Invariants and BST Properties -/ | ||
| section Invariants | ||
|
|
||
| inductive ForallTree (p : α → Prop) : BinaryTree α → Prop |
There was a problem hiding this comment.
This is just the induction principle of BinaryTree stated in a convoluted way.
There was a problem hiding this comment.
I dont think that ForallTree p _ is the induction principle of BinaryTree. it is the tree analogue of List.Forall p _ (for example see ForallTree_iff_toKeyList in Correctness.lean for the equivalence with the list-based characterisation.)
Keeping it makes pattern-matching easier on the tree constructors and goes well with cases/induction tactics in rotation and BST-preservation proofs.
| ForallTree p r → | ||
| ForallTree p (.node l key r) | ||
|
|
||
| inductive IsBST [LinearOrder α] : BinaryTree α → Prop |
There was a problem hiding this comment.
See the design of Batteries RBMap for defining these kinds of functions. Ideally this should be defined through a fold function. The first step is of course to write the map and fold functions and API lemmas for them. See RBMap in Batteries and lean core for examples.
| end Invariants | ||
|
|
||
|
|
||
| /-! ### Accessor Lemmas for ForallTree -/ |
There was a problem hiding this comment.
This entire section should fall out from API lemmas for fold and map.
| /-! ### BST Structure -/ | ||
| section BSTStructure | ||
|
|
||
| structure BST (α : Type) [LinearOrder α] where |
There was a problem hiding this comment.
Chris is right. This is an unnecessary bundling. You can operate solely on BinaryTree and insert IsBST as a hypothesis in those theorems that need it.
This PR introduces Splay Trees to CSLib, the algorithmic definitions, the correctness proofs, and the (amortised) complexity analysis.
Design & Architecture: The implementation is partitioned into four modules to isolate dependencies.
Basic: Core definitions (splay, splayUp, descend, Frame). Primitive rotations are upstreamed to BinaryTree.
Correctness:
Complexity: We formalise the Sleator-Tarjan potential method.
BSTAPI: A user-facing wrapper providing a bundled BST API. Users can splay binary search trees naturally without having to manually supply invariant proofs (for example that splaying a binary search tree returns automatically binary search tree).
Why Bottom-Up? (Comparison with Top-Down):
There is a complementary top-down implementation available for reference here. This PR utilises a bottom-up approach because it reduces the length of the formalisation:
No "Broken" Trees: Top-down splaying partitions the tree into three disconnected pieces (Left, Right, Middle) while searching. This makes tracking the mathematical potential function more difficult, as the potential function φ expects a whole tree. Our bottom-up approach leaves the tree intact—it just records the search path on the way down, and applies local rotations on the way up. The tree is always whole.
Odd vs. Even Paths: Splaying works by rotating edges in pairs (e.g., zig-zig). If a path has an odd number of edges, top-down requires, asymmetrical edge-case code to handle the leftover rotation while stitching the tree back together. By modelling the path as a list of Frames, our bottom-up approach processes pairs natively via list induction.
Search first & Rotate after: Top-down tries to search and restructure at the exact same time. Bottom-up strictly separates the logic: descend purely finds the node, and splayUp purely rotates it. This allows us to prove things about path lengths and node existence completely independently of the rotation proofs.
Symmetry Exploitation: The proofs utilise formalised mirror symmetry (mirror, flip). This allows left/right symmetric double rotations (like zig-zig vs. zag-zag) to be proven using generic transformations rather making things redundant by duplicating code with a "mirror" logic.
Co-authored-by: Anton Kovsharov antonkov@google.com
Co-authored-by: Antoine du Fresne von Hohenesche antoine@du-fresne.ch
Co-authored-by: Sorrachai Yingchareonthawornchai sorrachai.cp@gmail.com