Skip to content

Commit 8a706b9

Browse files
mivertowskiclaude
andcommitted
Add executive overview document and update academic paper
Add ringkernel-executive-overview.tex/pdf with branded title page, architecture diagrams, performance benchmarks, Python wrapper content, and links & resources. Update academic paper sections with expanded content on system design, implementation, evaluation, and discussion. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent a996d86 commit 8a706b9

12 files changed

Lines changed: 1034 additions & 21 deletions

docs/paper/main.pdf

19.1 KB
Binary file not shown.

docs/paper/main.tex

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
\usepackage{subcaption}
4242
\usepackage{graphicx}
4343
\usepackage{float}
44+
\usepackage{tabularx}
4445

4546
% Code listings
4647
\usepackage{listings}
@@ -163,7 +164,7 @@
163164
\end{abstract}
164165

165166
\vspace{1em}
166-
\noindent\textbf{Keywords:} Actor Model, GPU Computing, Persistent Kernels, Message Passing, Hybrid Logical Clocks, Lock-Free Algorithms, CUDA, WebGPU, Distributed Systems, Graph Analytics
167+
\noindent\textbf{Keywords:} Actor Model, GPU Computing, Persistent Kernels, Message Passing, Hybrid Logical Clocks, Lock-Free Algorithms, CUDA, WebGPU, Distributed Systems, Graph Analytics, Digital Twin, Bi-Temporal Model, Audit Intelligence
167168

168169
\vspace{0.5em}
169170
\noindent\textbf{ACM CCS:} Computer systems organization $\rightarrow$ Parallel architectures; Software and its engineering $\rightarrow$ Concurrent programming structures

docs/paper/sections/00-abstract.tex

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,11 @@
99

1010
This paper describes \textbf{RingKernel}, the Rust implementation of this paradigm,
1111
alongside three companion frameworks: \textbf{DotCompute} (.NET), \textbf{Orleans.GpuBridge}
12-
(Microsoft Orleans integration), and \textbf{RustGraph} (living graph database). Together,
13-
these systems demonstrate the broad applicability of GPU-native actors.
12+
(Microsoft Orleans integration), and \textbf{RustGraph} (living graph database). A fifth
13+
system, \textbf{RustAssureTwin}, demonstrates the full application stack: a native desktop
14+
audit intelligence platform consuming GPU-native actor state through bi-temporal analytics,
15+
AI-assisted reasoning, and ISA/PCAOB-compliant workflows. Together, these systems demonstrate
16+
the broad applicability of GPU-native actors from kernel to end-user.
1417

1518
Our key contributions are: (1) formalization of GPU actor semantics with Host-to-Kernel (H2K),
1619
Kernel-to-Host (K2H), and Kernel-to-Kernel (K2K) messaging channels; (2) a 128-byte

docs/paper/sections/01-introduction.tex

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,15 @@ \subsection{Our Contribution: GPU-Native Actors}
7878
Together, these systems demonstrate that GPU-native actors are a universal paradigm
7979
applicable across languages, frameworks, and domains.
8080

81+
A fifth system, \textbf{RustAssureTwin}, demonstrates the end-to-end application of this
82+
paradigm: a native desktop audit intelligence platform (Tauri~2 + Svelte~5 + Rust) that
83+
consumes RustGraph's living analytics to provide professional auditors with a
84+
\emph{digital twin} of the organization under audit. By surfacing GPU-computed O(1) queries
85+
through an interactive UI---complete with temporal playback, AI-assisted analysis, and
86+
cross-module drill-down---RustAssureTwin validates that GPU-native actors can serve
87+
non-technical domain experts without sacrificing the sub-microsecond responsiveness that
88+
the paradigm enables.
89+
8190
\subsection{Contributions}
8291

8392
This paper makes the following contributions:
@@ -104,6 +113,12 @@ \subsection{Contributions}
104113
simulation (RingKernel), enterprise accounting (DotCompute), distributed virtual
105114
actors (Orleans.GpuBridge), and living graph analytics (RustGraph).
106115

116+
\item \textbf{Application-Layer Digital Twin} (\S\ref{sec:implementation}): We
117+
present RustAssureTwin, a production desktop application that consumes GPU-native
118+
actor state via RustGraph to provide audit intelligence with bi-temporal analytics,
119+
tiered AI agents, and ISA/PCAOB-compliant workflows---demonstrating the full stack
120+
from GPU kernel to professional end-user.
121+
107122
\item \textbf{Comprehensive Evaluation} (\S\ref{sec:evaluation}): We demonstrate
108123
11,327$\times$ lower command latency and 2.7$\times$ higher mixed-workload
109124
throughput compared to traditional GPU programming on NVIDIA RTX Ada.

docs/paper/sections/03-related-work.tex

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -240,17 +240,17 @@ \subsection{Comparison Summary}
240240
\centering
241241
\caption{GPU-Native Actor Ecosystem Comparison}
242242
\label{tab:ecosystem-comparison}
243-
\begin{tabular}{@{}lllll@{}}
243+
\begin{tabularx}{\textwidth}{@{}l>{\raggedright\arraybackslash}X>{\raggedright\arraybackslash}X>{\raggedright\arraybackslash}X>{\raggedright\arraybackslash}X@{}}
244244
\toprule
245245
\textbf{Feature} & \textbf{RingKernel} & \textbf{DotCompute} & \textbf{Orleans.GpuBridge} & \textbf{RustGraph} \\
246246
\midrule
247247
Language & Rust & C\# (.NET 9) & C\# (Orleans) & Rust \\
248248
GPU Backends & CUDA, WebGPU & CUDA, OpenCL, Metal & CUDA, DotCompute & CUDA \\
249249
Primary Domain & FDTD simulation & General compute & Distributed actors & Graph analytics \\
250-
Message Latency & 0.03$\mu$s & 1.24$\mu$s & 100-500ns & 100-500ns \\
250+
Message Latency & 0.03\,$\mu$s & 1.24\,$\mu$s & 100--500\,ns & 100--500\,ns \\
251251
Actor Granularity & Thread block & Thread block & Grain (virtual) & Graph node \\
252252
Unique Feature & Rust-to-CUDA DSL & LINQ-to-GPU & Hypergraph actors & 64+ living analytics \\
253253
Test Coverage & 900+ tests & 215/234 tests & 1,231 tests & 1,400+ tests \\
254254
\bottomrule
255-
\end{tabular}
255+
\end{tabularx}
256256
\end{table*}

docs/paper/sections/04-system-design.tex

Lines changed: 72 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -350,13 +350,12 @@ \subsubsection{Domain Entity Types}
350350
\centering
351351
\caption{Unified hypergraph entity type ranges}
352352
\label{tab:entity-types}
353-
\begin{tabular}{@{}llr@{}}
353+
\begin{tabular}{@{}lp{7.5cm}r@{}}
354354
\toprule
355355
\textbf{Domain} & \textbf{Entity Types} & \textbf{Type Range} \\
356356
\midrule
357-
Accounting & Vendor, Customer, Account, JournalEntry, JournalLine, & 1--204 \\
358-
& PurchaseRequisition, PurchaseOrder, GoodsReceipt, Invoice, Payment & \\
359-
ICS & Control, Risk, Assertion, ControlObjective & 300--303 \\
357+
Accounting & Vendor, Customer, Account, JournalEntry, JournalLine, PurchaseRequisition, PurchaseOrder, GoodsReceipt, Invoice, Payment & 1--204 \\[3pt]
358+
ICS & Control, Risk, Assertion, ControlObjective & 300--303 \\[3pt]
360359
OCPM & Process, Activity, Event, ObjectType & 400--403 \\
361360
\bottomrule
362361
\end{tabular}
@@ -378,6 +377,37 @@ \subsubsection{Cross-Domain Edge Types}
378377
This unified structure enables queries that span domains, such as: ``Find all
379378
controls that cover accounts involved in activities with high fraud risk.''
380379

380+
\subsubsection{Regulatory Standards Mapping}
381+
382+
The unified hypergraph is not an arbitrary data model; its domains and cross-domain edges
383+
directly map to the requirements of international audit and compliance standards:
384+
385+
\begin{table}[h]
386+
\centering
387+
\caption{Mapping of unified hypergraph domains to professional standards}
388+
\label{tab:standards-mapping}
389+
\begin{tabular}{@{}lll@{}}
390+
\toprule
391+
\textbf{Standard} & \textbf{Requirement} & \textbf{Hypergraph Mapping} \\
392+
\midrule
393+
ISA 315 & Risk assessment of material misstatement & ICS $\rightarrow$ Accounting edges \\
394+
ISA 240 & Fraud risk identification & Fraud label bitmap (26 labels) \\
395+
ISA 500/530 & Audit evidence \& sampling & Three-way match, sample sizing \\
396+
ISA 570 & Going concern evaluation & Going concern analytics \\
397+
SOX 404 & Internal control effectiveness & Control coverage analytics \\
398+
PCAOB AS 2201 & Control testing \& deficiency classification & ICS entity types 300--303 \\
399+
\bottomrule
400+
\end{tabular}
401+
\end{table}
402+
403+
The ICS domain (types 300--303) models the control environment as defined by COSO~2013:
404+
Controls, Risks, Assertions, and Control Objectives form a graph structure where
405+
\texttt{MitigatesRisk} and \texttt{CoversAccount} edges encode the control-to-risk
406+
and control-to-account mappings required by ISA~315 for understanding the entity's
407+
internal control system. The living analytics continuously compute control coverage
408+
and deficiency classification, enabling the real-time monitoring required by SOX~404
409+
continuous auditing regimes.
410+
381411
\subsubsection{Fraud Label Bitmap}
382412

383413
Each node includes a 64-bit \texttt{label\_bitmap} field encoding 26 fraud labels
@@ -431,6 +461,44 @@ \subsubsection{Temporal Query Modes}
431461
\item \textbf{Period Comparison}: Compare Q1 vs Q2 analytics (PageRank delta, component changes)
432462
\end{itemize}
433463

464+
\subsubsection{Bi-Temporal and Business-Time Extension}
465+
466+
While HLC provides causal ordering within the GPU actor system, enterprise audit applications
467+
require a richer temporal model. RustAssureTwin extends the single-dimensional HLC timeline
468+
into a \emph{tri-temporal} model:
469+
470+
\begin{enumerate}
471+
\item \textbf{Valid Time} ($t_v$): When a fact is true in the real world (e.g., when an
472+
invoice was actually issued). This is the domain-level truth timeline.
473+
474+
\item \textbf{Transaction Time} ($t_t$): When a fact was recorded in the system. HLC
475+
timestamps map directly to this dimension---the GPU actor's \texttt{hlc\_physical} field
476+
captures the exact moment of state mutation.
477+
478+
\item \textbf{Business Time} ($t_b$): Fiscal periods, reporting deadlines, and audit
479+
cutoff dates. This is a discrete, calendar-aligned timeline (fiscal years, quarters,
480+
months) that does not correspond to any physical clock.
481+
\end{enumerate}
482+
483+
This tri-temporal model enables five query modes consumed by the application layer:
484+
485+
\begin{itemize}
486+
\item \textbf{Current}: Read the latest GPU actor state ($t_v = \text{now}, t_t = \text{now}$)
487+
\item \textbf{Point-in-Time}: State at a historical valid time ($t_v = T$)
488+
\item \textbf{As-Known-At}: State as recorded at a specific transaction time
489+
($t_t = T$), revealing what the system ``knew'' at that moment---critical for
490+
detecting retroactive journal entries
491+
\item \textbf{Period}: Aggregate state over a business period
492+
($t_b \in [\text{Q1 start}, \text{Q1 end}]$)
493+
\item \textbf{Comparison}: Delta between two points along any temporal dimension,
494+
e.g., comparing control coverage between Q3 and Q4
495+
\end{itemize}
496+
497+
The per-node history rings (16 entries) store \texttt{hlc\_timestamp} as $t_t$; the
498+
application layer annotates entries with $t_v$ and $t_b$ from domain metadata. This
499+
separation ensures the GPU actor system remains general-purpose while enabling
500+
domain-specific temporal reasoning at the application layer.
501+
434502
\subsubsection{Audit Trail Fields}
435503

436504
The \texttt{GpuNodeState} includes dedicated audit fields computed via living analytics:

docs/paper/sections/05-implementation.tex

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -650,3 +650,126 @@ \subsubsection{Combined Test Coverage}
650650
This cross-language implementation demonstrates that the GPU-native actor paradigm
651651
is not language-specific but a universal pattern applicable wherever persistent
652652
GPU kernels and lock-free messaging are available.
653+
654+
\subsection{Application Layer: RustAssureTwin}
655+
656+
While the previous sections describe the engine and infrastructure layers, an important
657+
validation of any systems paradigm is whether it can serve domain experts who are not
658+
systems programmers. \textbf{RustAssureTwin} is a native desktop audit intelligence
659+
platform that consumes RustGraph's GPU-native actor state to provide professional
660+
auditors with a \emph{digital twin} of the organization under audit.
661+
662+
\subsubsection{Architecture}
663+
664+
RustAssureTwin is built on a four-layer architecture:
665+
666+
\begin{enumerate}
667+
\item \textbf{Presentation Layer} (Svelte~5 + Tailwind~CSS): Seven application modules
668+
(Dashboard, Explorer, Process, Controls, Audit, Reports, Admin) rendered in a
669+
Tauri~2 desktop shell with wgpu-accelerated graph visualization supporting 100K+ nodes
670+
at 60~FPS.
671+
672+
\item \textbf{Application Core} (TypeScript + Svelte stores): 21 reactive stores
673+
managing graph state, temporal context, AI assistant, workflow progress, and
674+
cross-module navigation. Stores subscribe to RustGraph state changes via
675+
WebSocket and expose O(1) query results to UI components.
676+
677+
\item \textbf{Data Access Layer} (Rust backend via Tauri IPC): HTTP and WebSocket
678+
clients to RustGraph, local SQLite cache for offline operation, and bulk
679+
import/export for disconnected audit scenarios.
680+
681+
\item \textbf{Living Graph Engine} (RustGraph): GPU-native actors maintaining
682+
64+ analytics algorithms continuously, providing the always-current state that
683+
the application layer reads in O(1) time.
684+
\end{enumerate}
685+
686+
Communication between layers uses Tauri's IPC mechanism: the Svelte frontend calls
687+
\texttt{invoke()} to the Rust backend, which in turn queries RustGraph via HTTP/WebSocket.
688+
This architecture ensures that GPU actor state reaches the UI within a single frame budget
689+
(16.67ms at 60~FPS), validated by the mixed-workload evaluation in Section~\ref{sec:evaluation}.
690+
691+
\subsubsection{AI Agent Architecture}
692+
693+
RustAssureTwin integrates a three-tier AI agent system that is a first-class consumer of
694+
GPU-native actor state:
695+
696+
\begin{itemize}
697+
\item \textbf{Tier~1 --- Observational}: Pattern detection and anomaly flagging.
698+
The AI reads O(1) living analytics (fraud triangle scores, control coverage, SoD
699+
violations) and surfaces findings to auditors. This tier requires no write access
700+
to the graph and operates within ISA~200 professional skepticism guidelines.
701+
702+
\item \textbf{Tier~2 --- Analytical}: Risk assessment and sampling recommendations.
703+
The AI aggregates GPU-computed analytics across the unified hypergraph to recommend
704+
sample sizes (per ISA~530) and identify high-risk areas. Confidence thresholds
705+
gate AI suggestions: $\geq$0.95 for factual queries, $\geq$0.80 for analytical
706+
suggestions, $\geq$0.70 for exploratory recommendations.
707+
708+
\item \textbf{Tier~3 --- Collaborative}: Workpaper drafting and finding
709+
documentation. The AI proposes structured content for audit workpapers, but all
710+
output requires explicit human approval before inclusion---enforcing the
711+
human-in-the-loop governance required by ISA~220 and PCAOB AS~1201.
712+
\end{itemize}
713+
714+
The AI agents benefit directly from the GPU actor paradigm: because analytics are maintained
715+
continuously (not computed on-demand), the AI can access fraud scores, control coverage,
716+
and process conformance metrics in 3--17~ns per query. This enables conversational-speed
717+
AI interactions where the assistant can traverse the full unified hypergraph context
718+
within a single user-perceived response latency.
719+
720+
\subsubsection{Temporal Visualization Pipeline}
721+
722+
The bi-temporal model described in Section~\ref{sec:design} is surfaced through an
723+
interactive temporal visualization pipeline:
724+
725+
\begin{enumerate}
726+
\item \textbf{Temporal bar}: A global timeline control in the application footer
727+
allows auditors to scrub to any point in valid time, transaction time, or business
728+
period. Changing the temporal context updates all modules simultaneously.
729+
730+
\item \textbf{Playback controls}: Step-by-step temporal navigation with configurable
731+
granularity (event, day, week, month) and playback speed (0.25$\times$--8$\times$).
732+
The wgpu renderer animates graph state transitions by interpolating between
733+
per-node history ring entries.
734+
735+
\item \textbf{Bi-temporal timeline}: A dual-axis visualization showing valid time
736+
(horizontal) vs transaction time (vertical), enabling auditors to identify
737+
retroactive journal entries---transactions where $t_v \ll t_t$, a key fraud
738+
indicator under ISA~240.
739+
\end{enumerate}
740+
741+
This pipeline demonstrates a concrete consumer of the per-node history rings: each
742+
ring entry's HLC timestamp drives the animation, while the application layer maps
743+
HLC values to human-readable dates and fiscal periods.
744+
745+
\subsubsection{Audit Workflow Integration}
746+
747+
RustAssureTwin implements complete audit workflows that consume GPU-native actor state:
748+
749+
\begin{itemize}
750+
\item \textbf{Audit Scoping}: A scope wizard identifies Significant Classes of
751+
Transactions (SCOTs) by querying GPU-resident entity types and their risk scores.
752+
A cost model editor estimates audit effort using role rates and SCOT complexity.
753+
754+
\item \textbf{Control Testing}: Step-by-step test execution panels guide auditors
755+
through control tests, with test results written back to the graph as audit
756+
evidence nodes linked to Control entities (types 300--303).
757+
758+
\item \textbf{Workpaper Management}: Section-based editors with embedded graph
759+
snapshots---auditors can capture the current state of any subgraph and embed it
760+
as evidence in ISA-compliant workpapers.
761+
762+
\item \textbf{Cross-Module Navigation}: A universal ``View in Explorer'' pattern
763+
enables drill-down from any entity reference (control ID, finding, risk score)
764+
to the Explorer module's graph visualization, preserving temporal and filter context
765+
via URL state synchronization.
766+
\end{itemize}
767+
768+
\subsubsection{Test Infrastructure}
769+
770+
RustAssureTwin maintains 456 unit tests, 48 integration tests, and a Playwright E2E
771+
test suite covering navigation, performance, and audit-specific workflows. The E2E tests
772+
validate that GPU actor state reaches the UI correctly: performance tests verify that
773+
all 7 application modules load within 3 seconds, canvas interactions respond within
774+
500ms, and the graph visualization survives rapid zoom/pan stress tests without memory
775+
issues.

docs/paper/sections/06-evaluation.tex

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -323,8 +323,24 @@ \subsubsection{When to Use GPU vs CPU}
323323

324324
\subsection{Mixed Workload Performance}
325325

326-
Real applications combine computation with interactive commands. We simulate a
327-
GUI application running at 60 FPS (16.67ms frame budget):
326+
Real applications combine computation with interactive commands. We evaluate
327+
against a concrete 60~FPS target motivated by \textbf{RustAssureTwin}, a native desktop
328+
application that renders 100K+ node graphs via wgpu while simultaneously consuming
329+
GPU-native actor analytics. The rendering pipeline operates as follows:
330+
331+
\begin{enumerate}
332+
\item \textbf{GPU actors}: RustGraph maintains living analytics (PageRank, fraud
333+
triangle, control coverage) via continuous K2K message propagation.
334+
\item \textbf{WebSocket stream}: State changes propagate to the desktop application
335+
via WebSocket subscriptions ($<$1ms transport latency).
336+
\item \textbf{wgpu renderer}: WGSL shaders render graph nodes with analytics-driven
337+
visual encoding (color = risk score, size = PageRank, opacity = control coverage).
338+
\item \textbf{Interactive commands}: User actions (zoom, pan, select, filter, temporal
339+
scrub) generate H2K commands that must complete within the 16.67ms frame budget.
340+
\end{enumerate}
341+
342+
This pipeline validates the mixed-workload scenario: computation and interaction must
343+
coexist within each frame.
328344

329345
\begin{table}[h]
330346
\centering
@@ -349,28 +365,39 @@ \subsection{Mixed Workload Performance}
349365
\begin{axis}[
350366
ybar,
351367
bar width=0.8cm,
352-
xlabel={},
353368
ylabel={Time (ms)},
354369
symbolic x coords={Traditional, Persistent},
355370
xtick=data,
356371
ymin=0,
357372
ymax=40,
358-
legend style={at={(0.5,-0.15)}, anchor=north, legend columns=2},
359-
nodes near coords,
360-
every node near coord/.append style={font=\tiny},
373+
legend style={at={(0.5,-0.20)}, anchor=north, legend columns=2},
374+
nodes near coords={\pgfmathprintnumber\pgfplotspointmeta},
375+
every node near coord/.append style={font=\small},
361376
width=0.8\columnwidth,
362377
height=6cm,
363378
]
364379
\addplot[fill=blue!60] coordinates {(Traditional, 3.2) (Persistent, 5.1)};
365380
\addplot[fill=red!60] coordinates {(Traditional, 31.7) (Persistent, 0.003)};
366381
\legend{Compute, Commands}
367382
\end{axis}
383+
%% Annotation callout for the near-zero Persistent Commands bar
384+
\draw[->, thick, red!70!black] (4.8, 2.2) -- (4.55, 0.55);
385+
\node[anchor=west, font=\small\bfseries, red!70!black] at (4.8, 2.2) {0.003\,ms};
386+
\node[anchor=west, font=\scriptsize, text width=2.5cm, red!70!black] at (4.8, 1.6) {(10,567$\times$ reduction)};
368387
\end{tikzpicture}
369-
\caption{Time breakdown for mixed workload. Command overhead dominates traditional
370-
approach.}
388+
\caption{Time breakdown for mixed workload. Command overhead dominates the traditional
389+
approach (31.7\,ms); persistent actors reduce it to 0.003\,ms---invisible at this scale.}
371390
\label{fig:mixed}
372391
\end{figure}
373392

393+
\textbf{Application validation}: RustAssureTwin's E2E test suite confirms that the
394+
persistent actor approach meets real application requirements: all 7 application modules
395+
load within 3 seconds, canvas interactions (click, zoom, pan) complete within 500ms,
396+
and the temporal playback pipeline---which reads per-node history ring entries and
397+
interpolates graph state---maintains smooth animation at playback speeds up to 8$\times$.
398+
The persistent actor model's 5.1ms total frame time leaves 11.5ms of headroom for
399+
UI rendering, temporal interpolation, and AI agent queries within the 16.67ms budget.
400+
374401
\subsection{Comparison with PERKS}
375402

376403
We compare against PERKS~\cite{huangfu2022perks}, the state-of-the-art persistent

0 commit comments

Comments
 (0)