Skip to content

Add Association (<| ... |>) data structure#14

Open
msollami wants to merge 22 commits into
stblake:mainfrom
msollami:feature/association-data-structure
Open

Add Association (<| ... |>) data structure#14
msollami wants to merge 22 commits into
stblake:mainfrom
msollami:feature/association-data-structure

Conversation

@msollami

@msollami msollami commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a first-class, hash-backed Association (<| … |>) data structure to Mathilda, modelled on the Wolfram Language, with its full family of builtins, parser/printer/Part integration, tests, docs, a frontend demo, and benchmarks. Associations are ordinary Association[Rule[k,v], …] expressions with unique, insertion-ordered keys, so the generic toolchain (Length, Map, ReplaceAll, ===, FullForm) works unchanged. Bulk operations use a transient open-addressing hash index (keyed by expr_hash/expr_eq) for amortised O(n) construction, grouping and lookup.

Changes

  • Builtins (src/assoc.c, Protected; AssociateTo is HoldFirst): Association, AssociationQ, Keys, Values, Lookup (single/default/list-of-keys), KeyExistsQ, KeyDrop, KeyTake, KeyValueMap, AssociationThread, Counts, GroupBy, Merge, AssociateTo.
  • Parser (src/parse.c): <| … |> literal syntax, including <|…|>[[key]] and <|…|>[args].
  • Part (src/part.c): assoc[[key]], assoc[[Key[k]]], positional assoc[[i]], assoc[[0]] → head; missing key → Missing["KeyAbsent", key].
  • Normal (src/calculus/series.c): Normal[assoc] → list of rules.
  • Printing: <| … |> StandardForm/TeXForm (src/print.c) and KaTeX notebook output (src/print_latex.c); ill-formed associations fall back to head[args].
  • Symbols in src/sym_names.{c,h}; assoc_init wired into core_init.
  • Docs: new docs/spec/builtins/data-structures.md, changelog entry, Mathilda_spec.md index row.
  • Frontend: an "Associations" demo notebook in frontend/src/lib/canvas.ts.
  • Examples/benchmarks: examples/association_bench.m, examples/association_bench.py, examples/association-benchmarks.md.

Testing

  • tests/test_association.c: 43 end-to-end tests (parser, all builtins, printing, Part, in-place AssociateTo, generic-tool interaction) — all pass.
  • Existing suites re-run with no regressions: parse_tests, eval_tests, core_tests, list_tests, regression_tests.
  • 0 leaks for 0 total leaked bytes under macOS leaks while exercising every builtin (including the mutating and bulk paths).
  • Clean build under -std=c99 -Wall -Wextra (no new warnings).
  • Benchmarks (measured): Counts scales linearly and keeps pace with CPython's collections.Counter; ~2,500× faster than naive O(n²) accumulation at N=2000.

Note: the frontend npm run check was not run (node_modules not installed in this environment); the canvas.ts change mirrors the existing gallery-card pattern exactly.

JIRA Ticket

N/A

msollami added 22 commits July 5, 2026 00:53
Introduce a first-class, hash-backed Association data structure modelled
on the Wolfram Language, together with its family of builtins and full
end-to-end coverage.

Representation
- Associations are ordinary expressions: Association[Rule[k,v], ...] with
  unique, insertion-ordered keys (first occurrence fixes position, last
  fixes value). This keeps the generic toolchain (Length, Map, ReplaceAll,
  ===, FullForm) working unchanged.
- Bulk operations are driven by a transient open-addressing hash index
  (keyed by expr_hash / expr_eq) for amortised O(n) construction,
  de-duplication, grouping and lookup.

Builtins (src/assoc.c), all Protected; AssociateTo is HoldFirst
- Association, AssociationQ, Keys, Values
- Lookup (single key with optional default -> Missing["KeyAbsent", key],
  or a list of keys resolved with one index build, O(n+m))
- KeyExistsQ, KeyDrop, KeyTake, KeyValueMap
- AssociationThread, Counts, GroupBy, Merge, AssociateTo

Language integration
- Parser: <| ... |> literal syntax (src/parse.c), including postfix forms
  <|...|>[[key]] and <|...|>[args].
- Part: assoc[[key]] / assoc[[Key[k]]] / positional assoc[[i]]; assoc[[0]]
  gives the head; a missing key gives Missing["KeyAbsent", key] (src/part.c).
- Normal[assoc] -> list of rules (src/calculus/series.c).
- Printing: <| ... |> StandardForm and TeXForm (src/print.c) and KaTeX
  notebook output (src/print_latex.c), falling back to head[args] form for
  ill-formed associations.
- Symbols registered in src/sym_names.{c,h}; assoc_init wired into core_init.

Tests, docs, examples
- tests/test_association.c: 43 end-to-end tests (parser, builtins, printing,
  Part, in-place AssociateTo, generic-tool interaction); 0 leaks under leaks.
- docs/spec/builtins/data-structures.md (new category) with verified
  examples; changelog entry; Mathilda_spec.md index row.
- frontend/src/lib/canvas.ts: an Associations demo notebook in the gallery.
- examples/: association_bench.m, association_bench.py, and
  association-benchmarks.md documenting measured O(n) scaling — Counts keeps
  pace with CPython's Counter and is ~2,500x faster than naive O(n^2)
  accumulation at N=2000.
Extend the Association data structure so it flows through the functional
toolchain the Wolfram way, and round out the key-operation family.

- Map and Select thread over association values, preserving keys:
  Map[f, <|k -> v|>] -> <|k -> f[v]|>; Select[assoc, p] filters by value.
  New assoc_map_values / assoc_select_values (src/assoc.c) dispatched from
  builtin_map / builtin_select (src/funcprog.c) for the default level.
- New key operations (Protected): KeySort, KeySortBy (stable), KeyMap,
  KeySelect.
- New aggregation builtins: CountsBy[list, f], PositionIndex[list]
  (<|value -> {positions}|>, hash-indexed O(n)), AssociationMap[f, {keys}].
- Symbols registered in src/sym_names.{c,h}; all wired into assoc_init.

Tests/docs: +10 end-to-end tests (53 total in tests/test_association.c),
new sections in docs/spec/builtins/data-structures.md, changelog entry, and
extra demo cells in the frontend Associations notebook. Clean -Wall -Wextra
build, 0 leaks under leaks, no regressions in funcprog/list/core/eval/regression.
Ordering and aggregation now act on an association's values (Wolfram
semantics), consistent with the earlier Map/Select value threading.

- Sort[assoc] orders entries by value (keys follow); Total/Min/Max reduce
  over the values. New assoc_sort_by_value and reusable assoc_apply_over_values
  (src/assoc.c), dispatched from builtin_sort (src/sort.c), builtin_total
  (src/list/total.c), and builtin_min/builtin_max (src/list/minmax.c).
- Join[assoc1, assoc2, ...] merges associations (later value wins) — worked
  already via key de-duplication; now documented and covered by a test.

Tests/docs: +5 e2e tests (58 total), new spec section, changelog entry.
Clean -Wall -Wextra build; 0 leaks; no regressions in sort/list/core/eval/regression.
Make associations mutable through Part, and round out value aggregation.

- Part assignment on associations (src/part.c, expr_part_assign_rec):
  a[[key]] = val updates an existing key or appends a new key -> val entry;
  a[[Key[k]]] = val targets a key explicitly; a[[i]] = val updates the i-th
  value positionally; multi-index a[[k1, k2]] = val descends into nested
  associations and lists. Read-modify-write (a[[k]] = a[[k]] + 1) works.
- Mean[assoc] averages the values (via assoc_apply_over_values, src/stats.c).

Tests/docs: +8 e2e tests (66 total), new spec sections, changelog entry, and
an in-place-mutation demo in the frontend Associations notebook. Clean
-Wall -Wextra build; 0 leaks; no regressions in list_set/stats/core/eval/
regression/list suites.
Extend the pattern-matching toolchain to associations (Wolfram semantics).

- Cases[assoc, patt] and Count[assoc, patt] match against the association's
  values, delegating through Values[assoc] via the shared
  assoc_apply_over_values (src/patterns.c).
- DeleteCases[assoc, patt] removes entries whose value matches the pattern,
  returning an association — new assoc_delete_cases (src/assoc.c) tests each
  value with MatchQ.

Tests/docs: +4 e2e tests (70 total), new spec section, changelog entry, and
two demo cells in the frontend Associations notebook. Clean -Wall -Wextra
build; 0 leaks; no regressions in patterns/core/eval/regression suites.
New general predicate builtins, plus association value-threading.

- AllTrue, AnyTrue, NoneTrue (src/funcprog.c, Protected): test a predicate
  across a list's elements, short-circuiting and left unevaluated when a test
  result is neither True nor False (Wolfram semantics). Over an association
  they test the values.
- MemberQ[assoc, form] now tests the association's values (src/patterns.c).

Tests/docs: +6 e2e tests (82 total), new functional-programming spec section,
value-threading note in data-structures, changelog, and a frontend demo cell.
Clean -Wall -Wextra build; 0 leaks; no regressions in
funcprog/patterns/core/eval/regression/list suites.
New general SortBy, filling a gap and rounding out association ordering.

- SortBy[list, f] sorts a list by canonical order of f[element], evaluating
  the key once per element via a paired-key qsort (src/sort.c).
- SortBy[assoc, f] sorts an association by f applied to each value; keys
  follow their values.
- SortBy[f] is the operator form: SortBy[f][expr] == SortBy[expr, f].
- SYM_SortBy registered in src/sym_names.{c,h}.

Tests/docs: +6 e2e tests (88 total), new spec sections in
functional-programming and data-structures, changelog, frontend demo cell.
Clean -Wall -Wextra build; 0 leaks; no regressions in
sort/list/core/eval/regression suites.
New general extreme-selection builtins, idiomatic with associations.

- MaximalBy[list, f] / MinimalBy[list, f] give the element(s) maximising /
  minimising f (all ties, in order); the key is evaluated once per element
  (src/sort.c). Over an association they return the entries whose value is
  extremal, as an association. MaximalBy[f] / MinimalBy[f] are operator forms.
- SYM_MaximalBy / SYM_MinimalBy registered in src/sym_names.{c,h}.

Tests/docs: +5 e2e tests (93 total), new spec sections in
functional-programming and data-structures, changelog, frontend demo cell.
Clean -Wall -Wextra build; 0 leaks; no regressions in
sort/core/eval/regression suites.
Ranked top-N selection over lists and association values.

- TakeLargest[list, n] / TakeSmallest[list, n] give the n ranked-extreme
  elements (descending / ascending); TakeLargestBy / TakeSmallestBy rank by
  f[element] (src/sort.c, shared take_extreme helper over SortBy's paired-key
  machinery). Over an association they rank by value (or f of value) and
  return an association. n beyond the length returns all elements, ranked.

Tests/docs: +7 e2e tests (100 total), new spec sections in
functional-programming and data-structures, changelog, frontend demo cell.
Clean -Wall -Wextra build; 0 leaks; no regressions in
sort/core/eval/regression suites.
Turn GroupBy into a full aggregation primitive and add its list-returning
sibling.

- GroupBy[list, f, g] applies reducer g to each group, giving
  <|f[x] -> g[{group}], ...|> (e.g. GroupBy[Range[10], EvenQ, Total] ->
  <|False -> 25, True -> 30|>). Extends builtin_groupby (src/assoc.c).
- GatherBy[list, f] gathers elements with equal f[element] into a list of
  sublists in first-appearance order (hash-indexed, O(n)).

Tests/docs: +5 e2e tests (105 total), updated GroupBy + new GatherBy spec
sections, changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks;
no regressions in core/eval/regression suites.
- ReverseSort[coll] / ReverseSortBy[coll, f]: descending Sort / SortBy
  (src/sort.c), thin wrappers that reverse the ascending result; over an
  association they sort the entries by value (or f of value), descending.
- examples/association_showcase.m + association-showcase.md: a 100k-record
  split-apply-combine pipeline (Counts, GroupBy + reducer, ReverseSort,
  TakeLargest, Mean per group) with measured timings, showing the toolchain
  end-to-end at scale.

Tests/docs: +4 e2e tests (109 total), functional-programming spec section,
changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no
regressions in sort/core/regression suites.
First-match search with Missing["NotFound"] fallback, over lists and
association values.

- SelectFirst[list, pred[, default]] (src/funcprog.c): first element for which
  pred is True (short-circuits); Missing["NotFound"] or default otherwise.
  Over an association, tests the values and returns the first matching value.
- FirstCase[expr, patt[, default]] (src/patterns.c): first element matching
  patt, reusing Cases so it inherits the pattern and association-value
  semantics; Missing["NotFound"] or default otherwise.

Tests/docs: +8 e2e tests (117 total), functional-programming spec section,
changelog, frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no
regressions in patterns/funcprog/core/eval/regression suites.
- DeleteMissing[expr] removes all Missing[...] elements, delegating to
  DeleteCases[expr, _Missing] (src/patterns.c) so it inherits list and
  association-value handling. Natural cleanup after a multi-key Lookup; over an
  association it drops entries whose value is Missing[...].

Tests/docs: +4 e2e tests (121 total), data-structures spec section, changelog,
frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in
patterns/core/eval/regression suites.
Hardening iteration (no new builtins).

- Add multi-builtin integration tests exercising realistic pipelines
  (word-frequency top-N, group-reduce-rank, Lookup -> DeleteMissing -> Total,
  Merge of Counts) plus empty-collection edge tests confirming association
  reductions match the underlying list behaviour (Total[<||>] as Total[{}],
  etc.). 129 e2e tests total in tests/test_association.c.
- Document the value-threading design principle (element-wise, reductions,
  ordering and pattern/predicate ops all thread over values, key-aligned)
  in docs/spec/builtins/data-structures.md, and add a composed-pipeline demo
  to the frontend Associations notebook.

All association tests pass; no source changes so no build/leak impact.
Make associations destructurable in patterns and rules.

- KeyValuePattern[{k1 -> p1, ...}] (or a single k -> p) matches an association
  or list of rules containing the given keys with matching values; value
  patterns bind, so associations can be destructured (Replace with v_) and used
  to filter records (Cases[records, KeyValuePattern[{"t" -> _}]]). Implemented
  as a self-contained branch in the matcher (src/match.c) keyed on a new pattern
  head, so existing matching is unaffected; registered Protected with docstring.
- Lock in that Append/Prepend already extend associations (update-or-add,
  order-preserving) with tests.

Tests/docs: +11 e2e tests (140 total), data-structures spec sections, changelog,
frontend pattern-matching demo. Clean -Wall -Wextra build; 0 leaks; no
regressions in match/match_extensive/patterns/replace/core/eval/regression.
- Applying an association as a function now looks the key up: <|...|>[key]
  gives the value or Missing["KeyAbsent", key]; assoc[Key[k]] is the explicit
  form. Implemented as a compound-head application branch in the evaluator
  (src/eval.c), alongside pure-function application. Makes Map[#[key] &, records]
  work over a list of associations. Normal function application is unaffected.

Tests/docs: +5 e2e tests (145 total), data-structures spec update, changelog,
frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in
eval/purefunc/funcprog/match/core/regression suites.
- Extend the association accessor (src/eval.c) to multi-key nested lookup:
  assoc[k1, k2, ...] looks up k1 and applies the value to the remaining keys,
  so <|"a" -> <|"b" -> 5|>|>["a", "b"] is 5 and tab["row", "col"] reads a cell
  of an association-of-associations. A missing intermediate key propagates
  Missing["KeyAbsent", ...].

Tests/docs: +4 e2e tests (149 total), data-structures spec update, changelog,
frontend demo cell. Clean -Wall -Wextra build; 0 leaks; no regressions in
eval/purefunc/core/regression suites.
Associations can now be destructured directly in a function definition:
  area[KeyValuePattern[{"w" -> w_, "h" -> h_}]] := w h
  area[<|"w" -> 3, "h" -> 4|>]   (* -> 12 *)

Root cause: the DownValue dispatch fast-path filter (src/symtab.c,
pattern_arg_head_canon) treated a first-arg pattern head of KeyValuePattern or
Except as a literal head, so it skipped the rule whenever the input's first-arg
head differed (e.g. Association). Both are now treated as wildcards (return
NULL), so the filter never skips them -- strictly more conservative, cannot
cause missed matches. Matching always worked via MatchQ/Cases; only this
fast-path filter was wrong. Fix also repairs Except in a DownValue LHS.

Tests/docs: +4 e2e tests (153 total), data-structures spec update, changelog,
frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions in
symtab/match/match_extensive/patterns/replace/eval/core/regression suites.
- GroupBy[list, keyfn -> valfn] groups by keyfn[x] but collects valfn[x] in
  each group; GroupBy[list, keyfn -> valfn, g] then reduces each group by g.
  Completes the Wolfram GroupBy signature and expresses the whole
  group / extract-field / reduce pipeline in one call, e.g.
  GroupBy[txns, First -> Last, Total]. Extends builtin_groupby (src/assoc.c).
- Simplified examples/association_showcase.m and the frontend demo to use the
  cleaner one-call form (output unchanged).

Tests/docs: +3 e2e tests (156 total), data-structures spec update, changelog.
Clean -Wall -Wextra build; 0 leaks; no regressions in core/eval/regression.
- SortBy[list, {f1, f2, ...}] sorts by f1, breaking ties with f2, and so on
  (also over associations, by value). Implemented by building each element's
  sort key as the tuple {f1[e], ...}; expr_compare already orders equal-length
  lists lexicographically, giving exact multi-criteria ordering (src/sort.c).
  Single-criterion and operator forms are unchanged.

Tests/docs: +2 e2e tests (158 total), functional-programming spec update,
changelog, frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions
in sort/core/regression suites.
- Fold[f, seed, assoc] / FoldList[...] now fold over the association's values
  in key order (rebuild over Values[assoc] at the top of fold_impl,
  src/funcprog.c). Fold over a plain list is unchanged.
- New Scan[f, expr] (src/funcprog.c, Protected): applies f to each element for
  side effects and returns Null; over an association it scans the values.

Tests/docs: +5 e2e tests (163 total), functional-programming spec sections,
changelog, frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions
in fold/foldlist/core/eval/regression suites.
- First[assoc] / Last[assoc] now give the first / last value (matching
  Wolfram) instead of the whole key -> value rule (src/part.c). Lists and
  general expressions are unchanged.
- Add tests covering First/Last plus Rest/Most/Take/Drop (which already slice
  entries and return an association, order preserved).

Tests/docs: +6 e2e tests (169 total), data-structures spec section, changelog,
frontend demo. Clean -Wall -Wextra build; 0 leaks; no regressions in
core/eval/regression/list suites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant