Add Association (<| ... |>) data structure#14
Open
msollami wants to merge 22 commits into
Open
Conversation
Introduce a first-class, hash-backed Association data structure modelled
on the Wolfram Language, together with its family of builtins and full
end-to-end coverage.
Representation
- Associations are ordinary expressions: Association[Rule[k,v], ...] with
unique, insertion-ordered keys (first occurrence fixes position, last
fixes value). This keeps the generic toolchain (Length, Map, ReplaceAll,
===, FullForm) working unchanged.
- Bulk operations are driven by a transient open-addressing hash index
(keyed by expr_hash / expr_eq) for amortised O(n) construction,
de-duplication, grouping and lookup.
Builtins (src/assoc.c), all Protected; AssociateTo is HoldFirst
- Association, AssociationQ, Keys, Values
- Lookup (single key with optional default -> Missing["KeyAbsent", key],
or a list of keys resolved with one index build, O(n+m))
- KeyExistsQ, KeyDrop, KeyTake, KeyValueMap
- AssociationThread, Counts, GroupBy, Merge, AssociateTo
Language integration
- Parser: <| ... |> literal syntax (src/parse.c), including postfix forms
<|...|>[[key]] and <|...|>[args].
- Part: assoc[[key]] / assoc[[Key[k]]] / positional assoc[[i]]; assoc[[0]]
gives the head; a missing key gives Missing["KeyAbsent", key] (src/part.c).
- Normal[assoc] -> list of rules (src/calculus/series.c).
- Printing: <| ... |> StandardForm and TeXForm (src/print.c) and KaTeX
notebook output (src/print_latex.c), falling back to head[args] form for
ill-formed associations.
- Symbols registered in src/sym_names.{c,h}; assoc_init wired into core_init.
Tests, docs, examples
- tests/test_association.c: 43 end-to-end tests (parser, builtins, printing,
Part, in-place AssociateTo, generic-tool interaction); 0 leaks under leaks.
- docs/spec/builtins/data-structures.md (new category) with verified
examples; changelog entry; Mathilda_spec.md index row.
- frontend/src/lib/canvas.ts: an Associations demo notebook in the gallery.
- examples/: association_bench.m, association_bench.py, and
association-benchmarks.md documenting measured O(n) scaling — Counts keeps
pace with CPython's Counter and is ~2,500x faster than naive O(n^2)
accumulation at N=2000.
Extend the Association data structure so it flows through the functional
toolchain the Wolfram way, and round out the key-operation family.
- Map and Select thread over association values, preserving keys:
Map[f, <|k -> v|>] -> <|k -> f[v]|>; Select[assoc, p] filters by value.
New assoc_map_values / assoc_select_values (src/assoc.c) dispatched from
builtin_map / builtin_select (src/funcprog.c) for the default level.
- New key operations (Protected): KeySort, KeySortBy (stable), KeyMap,
KeySelect.
- New aggregation builtins: CountsBy[list, f], PositionIndex[list]
(<|value -> {positions}|>, hash-indexed O(n)), AssociationMap[f, {keys}].
- Symbols registered in src/sym_names.{c,h}; all wired into assoc_init.
Tests/docs: +10 end-to-end tests (53 total in tests/test_association.c),
new sections in docs/spec/builtins/data-structures.md, changelog entry, and
extra demo cells in the frontend Associations notebook. Clean -Wall -Wextra
build, 0 leaks under leaks, no regressions in funcprog/list/core/eval/regression.
Ordering and aggregation now act on an association's values (Wolfram semantics), consistent with the earlier Map/Select value threading. - Sort[assoc] orders entries by value (keys follow); Total/Min/Max reduce over the values. New assoc_sort_by_value and reusable assoc_apply_over_values (src/assoc.c), dispatched from builtin_sort (src/sort.c), builtin_total (src/list/total.c), and builtin_min/builtin_max (src/list/minmax.c). - Join[assoc1, assoc2, ...] merges associations (later value wins) — worked already via key de-duplication; now documented and covered by a test. Tests/docs: +5 e2e tests (58 total), new spec section, changelog entry. Clean -Wall -Wextra build; 0 leaks; no regressions in sort/list/core/eval/regression.
Make associations mutable through Part, and round out value aggregation. - Part assignment on associations (src/part.c, expr_part_assign_rec): a[[key]] = val updates an existing key or appends a new key -> val entry; a[[Key[k]]] = val targets a key explicitly; a[[i]] = val updates the i-th value positionally; multi-index a[[k1, k2]] = val descends into nested associations and lists. Read-modify-write (a[[k]] = a[[k]] + 1) works. - Mean[assoc] averages the values (via assoc_apply_over_values, src/stats.c). Tests/docs: +8 e2e tests (66 total), new spec sections, changelog entry, and an in-place-mutation demo in the frontend Associations notebook. Clean -Wall -Wextra build; 0 leaks; no regressions in list_set/stats/core/eval/ regression/list suites.
Extend the pattern-matching toolchain to associations (Wolfram semantics). - Cases[assoc, patt] and Count[assoc, patt] match against the association's values, delegating through Values[assoc] via the shared assoc_apply_over_values (src/patterns.c). - DeleteCases[assoc, patt] removes entries whose value matches the pattern, returning an association — new assoc_delete_cases (src/assoc.c) tests each value with MatchQ. Tests/docs: +4 e2e tests (70 total), new spec section, changelog entry, and two demo cells in the frontend Associations notebook. Clean -Wall -Wextra build; 0 leaks; no regressions in patterns/core/eval/regression suites.
New general predicate builtins, plus association value-threading. - AllTrue, AnyTrue, NoneTrue (src/funcprog.c, Protected): test a predicate across a list's elements, short-circuiting and left unevaluated when a test result is neither True nor False (Wolfram semantics). Over an association they test the values. - MemberQ[assoc, form] now tests the association's values (src/patterns.c). Tests/docs: +6 e2e tests (82 total), new functional-programming spec section, value-threading note in data-structures, changelog, and a frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in funcprog/patterns/core/eval/regression/list suites.
New general SortBy, filling a gap and rounding out association ordering.
- SortBy[list, f] sorts a list by canonical order of f[element], evaluating
the key once per element via a paired-key qsort (src/sort.c).
- SortBy[assoc, f] sorts an association by f applied to each value; keys
follow their values.
- SortBy[f] is the operator form: SortBy[f][expr] == SortBy[expr, f].
- SYM_SortBy registered in src/sym_names.{c,h}.
Tests/docs: +6 e2e tests (88 total), new spec sections in
functional-programming and data-structures, changelog, frontend demo cell.
Clean -Wall -Wextra build; 0 leaks; no regressions in
sort/list/core/eval/regression suites.
New general extreme-selection builtins, idiomatic with associations.
- MaximalBy[list, f] / MinimalBy[list, f] give the element(s) maximising /
minimising f (all ties, in order); the key is evaluated once per element
(src/sort.c). Over an association they return the entries whose value is
extremal, as an association. MaximalBy[f] / MinimalBy[f] are operator forms.
- SYM_MaximalBy / SYM_MinimalBy registered in src/sym_names.{c,h}.
Tests/docs: +5 e2e tests (93 total), new spec sections in
functional-programming and data-structures, changelog, frontend demo cell.
Clean -Wall -Wextra build; 0 leaks; no regressions in
sort/core/eval/regression suites.
Ranked top-N selection over lists and association values. - TakeLargest[list, n] / TakeSmallest[list, n] give the n ranked-extreme elements (descending / ascending); TakeLargestBy / TakeSmallestBy rank by f[element] (src/sort.c, shared take_extreme helper over SortBy's paired-key machinery). Over an association they rank by value (or f of value) and return an association. n beyond the length returns all elements, ranked. Tests/docs: +7 e2e tests (100 total), new spec sections in functional-programming and data-structures, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in sort/core/eval/regression suites.
Turn GroupBy into a full aggregation primitive and add its list-returning
sibling.
- GroupBy[list, f, g] applies reducer g to each group, giving
<|f[x] -> g[{group}], ...|> (e.g. GroupBy[Range[10], EvenQ, Total] ->
<|False -> 25, True -> 30|>). Extends builtin_groupby (src/assoc.c).
- GatherBy[list, f] gathers elements with equal f[element] into a list of
sublists in first-appearance order (hash-indexed, O(n)).
Tests/docs: +5 e2e tests (105 total), updated GroupBy + new GatherBy spec
sections, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks;
no regressions in core/eval/regression suites.
- ReverseSort[coll] / ReverseSortBy[coll, f]: descending Sort / SortBy (src/sort.c), thin wrappers that reverse the ascending result; over an association they sort the entries by value (or f of value), descending. - examples/association_showcase.m + association-showcase.md: a 100k-record split-apply-combine pipeline (Counts, GroupBy + reducer, ReverseSort, TakeLargest, Mean per group) with measured timings, showing the toolchain end-to-end at scale. Tests/docs: +4 e2e tests (109 total), functional-programming spec section, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in sort/core/regression suites.
First-match search with Missing["NotFound"] fallback, over lists and association values. - SelectFirst[list, pred[, default]] (src/funcprog.c): first element for which pred is True (short-circuits); Missing["NotFound"] or default otherwise. Over an association, tests the values and returns the first matching value. - FirstCase[expr, patt[, default]] (src/patterns.c): first element matching patt, reusing Cases so it inherits the pattern and association-value semantics; Missing["NotFound"] or default otherwise. Tests/docs: +8 e2e tests (117 total), functional-programming spec section, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in patterns/funcprog/core/eval/regression suites.
- DeleteMissing[expr] removes all Missing[...] elements, delegating to DeleteCases[expr, _Missing] (src/patterns.c) so it inherits list and association-value handling. Natural cleanup after a multi-key Lookup; over an association it drops entries whose value is Missing[...]. Tests/docs: +4 e2e tests (121 total), data-structures spec section, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in patterns/core/eval/regression suites.
Hardening iteration (no new builtins).
- Add multi-builtin integration tests exercising realistic pipelines
(word-frequency top-N, group-reduce-rank, Lookup -> DeleteMissing -> Total,
Merge of Counts) plus empty-collection edge tests confirming association
reductions match the underlying list behaviour (Total[<||>] as Total[{}],
etc.). 129 e2e tests total in tests/test_association.c.
- Document the value-threading design principle (element-wise, reductions,
ordering and pattern/predicate ops all thread over values, key-aligned)
in docs/spec/builtins/data-structures.md, and add a composed-pipeline demo
to the frontend Associations notebook.
All association tests pass; no source changes so no build/leak impact.
Make associations destructurable in patterns and rules.
- KeyValuePattern[{k1 -> p1, ...}] (or a single k -> p) matches an association
or list of rules containing the given keys with matching values; value
patterns bind, so associations can be destructured (Replace with v_) and used
to filter records (Cases[records, KeyValuePattern[{"t" -> _}]]). Implemented
as a self-contained branch in the matcher (src/match.c) keyed on a new pattern
head, so existing matching is unaffected; registered Protected with docstring.
- Lock in that Append/Prepend already extend associations (update-or-add,
order-preserving) with tests.
Tests/docs: +11 e2e tests (140 total), data-structures spec sections, changelog,
frontend pattern-matching demo. Clean -Wall -Wextra build; 0 leaks; no
regressions in match/match_extensive/patterns/replace/core/eval/regression.
- Applying an association as a function now looks the key up: <|...|>[key] gives the value or Missing["KeyAbsent", key]; assoc[Key[k]] is the explicit form. Implemented as a compound-head application branch in the evaluator (src/eval.c), alongside pure-function application. Makes Map[#[key] &, records] work over a list of associations. Normal function application is unaffected. Tests/docs: +5 e2e tests (145 total), data-structures spec update, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in eval/purefunc/funcprog/match/core/regression suites.
- Extend the association accessor (src/eval.c) to multi-key nested lookup: assoc[k1, k2, ...] looks up k1 and applies the value to the remaining keys, so <|"a" -> <|"b" -> 5|>|>["a", "b"] is 5 and tab["row", "col"] reads a cell of an association-of-associations. A missing intermediate key propagates Missing["KeyAbsent", ...]. Tests/docs: +4 e2e tests (149 total), data-structures spec update, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in eval/purefunc/core/regression suites.
Associations can now be destructured directly in a function definition:
area[KeyValuePattern[{"w" -> w_, "h" -> h_}]] := w h
area[<|"w" -> 3, "h" -> 4|>] (* -> 12 *)
Root cause: the DownValue dispatch fast-path filter (src/symtab.c,
pattern_arg_head_canon) treated a first-arg pattern head of KeyValuePattern or
Except as a literal head, so it skipped the rule whenever the input's first-arg
head differed (e.g. Association). Both are now treated as wildcards (return
NULL), so the filter never skips them -- strictly more conservative, cannot
cause missed matches. Matching always worked via MatchQ/Cases; only this
fast-path filter was wrong. Fix also repairs Except in a DownValue LHS.
Tests/docs: +4 e2e tests (153 total), data-structures spec update, changelog,
frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions in
symtab/match/match_extensive/patterns/replace/eval/core/regression suites.
- GroupBy[list, keyfn -> valfn] groups by keyfn[x] but collects valfn[x] in each group; GroupBy[list, keyfn -> valfn, g] then reduces each group by g. Completes the Wolfram GroupBy signature and expresses the whole group / extract-field / reduce pipeline in one call, e.g. GroupBy[txns, First -> Last, Total]. Extends builtin_groupby (src/assoc.c). - Simplified examples/association_showcase.m and the frontend demo to use the cleaner one-call form (output unchanged). Tests/docs: +3 e2e tests (156 total), data-structures spec update, changelog. Clean -Wall -Wextra build; 0 leaks; no regressions in core/eval/regression.
- SortBy[list, {f1, f2, ...}] sorts by f1, breaking ties with f2, and so on
(also over associations, by value). Implemented by building each element's
sort key as the tuple {f1[e], ...}; expr_compare already orders equal-length
lists lexicographically, giving exact multi-criteria ordering (src/sort.c).
Single-criterion and operator forms are unchanged.
Tests/docs: +2 e2e tests (158 total), functional-programming spec update,
changelog, frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions
in sort/core/regression suites.
- Fold[f, seed, assoc] / FoldList[...] now fold over the association's values in key order (rebuild over Values[assoc] at the top of fold_impl, src/funcprog.c). Fold over a plain list is unchanged. - New Scan[f, expr] (src/funcprog.c, Protected): applies f to each element for side effects and returns Null; over an association it scans the values. Tests/docs: +5 e2e tests (163 total), functional-programming spec sections, changelog, frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions in fold/foldlist/core/eval/regression suites.
- First[assoc] / Last[assoc] now give the first / last value (matching Wolfram) instead of the whole key -> value rule (src/part.c). Lists and general expressions are unchanged. - Add tests covering First/Last plus Rest/Most/Take/Drop (which already slice entries and return an association, order preserved). Tests/docs: +6 e2e tests (169 total), data-structures spec section, changelog, frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions in core/eval/regression/list suites.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a first-class, hash-backed Association (
<| … |>) data structure to Mathilda, modelled on the Wolfram Language, with its full family of builtins, parser/printer/Part integration, tests, docs, a frontend demo, and benchmarks. Associations are ordinaryAssociation[Rule[k,v], …]expressions with unique, insertion-ordered keys, so the generic toolchain (Length,Map,ReplaceAll,===,FullForm) works unchanged. Bulk operations use a transient open-addressing hash index (keyed byexpr_hash/expr_eq) for amortisedO(n)construction, grouping and lookup.Changes
src/assoc.c, Protected;AssociateTois HoldFirst):Association,AssociationQ,Keys,Values,Lookup(single/default/list-of-keys),KeyExistsQ,KeyDrop,KeyTake,KeyValueMap,AssociationThread,Counts,GroupBy,Merge,AssociateTo.src/parse.c):<| … |>literal syntax, including<|…|>[[key]]and<|…|>[args].src/part.c):assoc[[key]],assoc[[Key[k]]], positionalassoc[[i]],assoc[[0]]→ head; missing key →Missing["KeyAbsent", key].src/calculus/series.c):Normal[assoc]→ list of rules.<| … |>StandardForm/TeXForm (src/print.c) and KaTeX notebook output (src/print_latex.c); ill-formed associations fall back tohead[args].src/sym_names.{c,h};assoc_initwired intocore_init.docs/spec/builtins/data-structures.md, changelog entry,Mathilda_spec.mdindex row.frontend/src/lib/canvas.ts.examples/association_bench.m,examples/association_bench.py,examples/association-benchmarks.md.Testing
tests/test_association.c: 43 end-to-end tests (parser, all builtins, printing, Part, in-placeAssociateTo, generic-tool interaction) — all pass.parse_tests,eval_tests,core_tests,list_tests,regression_tests.0 leaks for 0 total leaked bytesunder macOSleakswhile exercising every builtin (including the mutating and bulk paths).-std=c99 -Wall -Wextra(no new warnings).Countsscales linearly and keeps pace with CPython'scollections.Counter; ~2,500× faster than naiveO(n²)accumulation at N=2000.Note: the frontend
npm run checkwas not run (node_modules not installed in this environment); thecanvas.tschange mirrors the existing gallery-card pattern exactly.JIRA Ticket
N/A