Skip to content

Latest commit

 

History

History
298 lines (221 loc) · 13.1 KB

File metadata and controls

298 lines (221 loc) · 13.1 KB

AGENTS.md — Pathrex

Project Overview

Pathrex is a Rust library and CLI tool for benchmarking queries on edge-labeled graphs constrained by regular languages and context-free languages. It uses SuiteSparse:GraphBLAS (via LAGraph) for sparse Boolean matrix operations and decomposes a graph by edge label into one Boolean adjacency matrix per label.

Repository Layout

pathrex/
├── Cargo.toml                  # Crate manifest (edition 2024)
├── build.rs                    # Links LAGraph + LAGraphX; optionally regenerates FFI bindings
├── src/
│   ├── lib.rs                  # Public modules: graph, formats, lagraph_sys; utils is pub(crate)
│   ├── main.rs                 # Binary entry point (placeholder)
│   ├── lagraph_sys.rs          # FFI module — includes generated bindings
│   ├── lagraph_sys_generated.rs# Bindgen output (checked in, regenerated in CI)
│   ├── utils.rs                # Internal helpers: CountingBuilder, CountOutput, VecSource,
│   │                           #   grb_ok! and la_ok! macros
│   ├── graph/
│   │   ├── mod.rs              # Core traits (GraphBuilder, GraphDecomposition, GraphSource,
│   │   │                       #   Backend, Graph<B>), error types, RAII wrappers, GrB init
│   │   └── inmemory.rs         # InMemory marker, InMemoryBuilder, InMemoryGraph
│   └── formats/
│       ├── mod.rs              # FormatError enum, re-exports
│       ├── csv.rs              # Csv<R> — CSV → Edge iterator (CsvConfig, ColumnSpec)
│       └── nt.rs               # NTriples<R> — N-Triples → Edge iterator (LabelExtraction)
├── tests/
│   └── inmemory_tests.rs       # Integration tests for InMemoryBuilder / InMemoryGraph
├── deps/
│   └── LAGraph/                # Git submodule (SparseLinearAlgebra/LAGraph)
└── .github/workflows/ci.yml   # CI: build GraphBLAS + LAGraph, cargo build & test

Build & Dependencies

System prerequisites

Dependency Purpose
SuiteSparse:GraphBLAS Sparse matrix engine (libgraphblas)
LAGraph Graph algorithm library on top of GraphBLAS (liblagraph)
cmake Building LAGraph from source
libclang-dev / clang Required by bindgen when regenerate-bindings feature is active

Building

# Ensure submodules are present
git submodule update --init --recursive

# Build and install SuiteSparse:GraphBLAS system-wide
git clone --depth 1 https://github.com/DrTimothyAldenDavis/GraphBLAS.git
cd GraphBLAS && make compact && sudo make install && cd ..

# Build LAGraph inside the submodule (no system-wide install required)
cd deps/LAGraph && make && cd ../..

# Build pathrex
cargo build

# Run tests
LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test

How build.rs handles linking

build.rs performs two jobs:

  1. Native linking. It emits six Cargo directives:

    • cargo:rustc-link-lib=dylib=graphblas — dynamically links libgraphblas.
    • cargo:rustc-link-search=native=/usr/local/lib — adds the system GraphBLAS install path to the native library search path.
    • cargo:rustc-link-lib=dylib=lagraph — dynamically links liblagraph.
    • cargo:rustc-link-search=native=deps/LAGraph/build/src — adds the submodule's core build output to the native library search path.
    • cargo:rustc-link-lib=dylib=lagraphx — dynamically links liblagraphx (experimental algorithms).
    • cargo:rustc-link-search=native=deps/LAGraph/build/experimental — adds the experimental build output to the native library search path.

    LAGraph does not need to be installed system-wide; building the submodule in deps/LAGraph/ is sufficient for compilation and linking. SuiteSparse:GraphBLAS must be installed system-wide (sudo make install).

    At runtime the OS dynamic linker (ld.so) does not use Cargo's link search paths — it only consults LD_LIBRARY_PATH, rpath, and the system library cache. Set LD_LIBRARY_PATH=/usr/local/lib after a system-wide LAGraph install, or include the submodule build paths if not installing system-wide.

  2. Optional FFI binding regeneration (feature regenerate-bindings). When the feature is active, regenerate_bindings() runs bindgen against deps/LAGraph/include/LAGraph.h and deps/LAGraph/include/LAGraphX.h (always from the submodule — no system path search), plus GraphBLAS.h (searched in /usr/local/include/suitesparse and /usr/include/suitesparse). The generated Rust file is written to src/lagraph_sys_generated.rs. Only a curated allowlist of GraphBLAS/LAGraph types and functions is exposed (see the allowlist_* calls in build.rs).

Feature flags

Feature Effect
regenerate-bindings Runs bindgen at build time to regenerate src/lagraph_sys_generated.rs from LAGraph.h, LAGraphX.h (both from deps/LAGraph/include) and GraphBLAS.h. Without this feature the checked-in bindings are used as-is.

Pre-generated FFI bindings

The file src/lagraph_sys_generated.rs is checked into version control. CI regenerates it with --features regenerate-bindings. Do not hand-edit this file.

Architecture & Key Abstractions

Edge

Edge is the universal currency between format parsers and graph builders: { source: String, target: String, label: String }.

GraphSource trait

GraphSource<B> is implemented by any data source that knows how to feed itself into a specific [GraphBuilder]:

Csv<R> implements GraphSource<InMemoryBuilder> directly, so it can be passed to [GraphBuilder::load].

GraphBuilder trait

GraphBuilder accumulates edges and produces a GraphDecomposition:

InMemoryBuilder also exposes lower-level helpers outside the trait:

Backend trait & Graph<B> handle

Backend associates a marker type with a concrete builder/graph pair:

pub trait Backend {
    type Graph: GraphDecomposition;
    type Builder: GraphBuilder<Graph = Self::Graph>;
}

Graph<B> is a zero-sized handle parameterised by a Backend:

InMemory is the concrete backend marker type.

GraphDecomposition trait

GraphDecomposition is the read-only query interface:

InMemoryBuilder / InMemoryGraph

InMemoryBuilder is the primary GraphBuilder implementation. It collects edges in RAM, then build() calls GraphBLAS to create one GrB_Matrix per label via COO format, wraps each in an LAGraph_Graph, and returns an InMemoryGraph.

Multiple CSV sources can be chained with repeated .load() calls; all edges are merged into a single graph.

Format parsers

Two built-in parsers are available, both yielding Iterator<Item = Result<Edge, FormatError>> and pluggable into GraphBuilder::load() via their GraphSource<InMemoryBuilder> impls.

Csv<R>

Csv<R> parses delimiter-separated edge files.

Configuration is via CsvConfig:

Field Default Description
source_column Index(0) Column for the source node (by index or name)
target_column Index(1) Column for the target node
label_column Index(2) Column for the edge label
has_header true Whether the first row is a header
delimiter b',' Field delimiter byte

ColumnSpec is either Index(usize) or Name(String). Name-based lookup requires has_header: true.

NTriples<R>

NTriples<R> parses W3C N-Triples RDF files using oxttl. Each triple (subject, predicate, object) becomes an Edge where:

  • source — subject IRI or blank-node ID (_:label).
  • target — object IRI or blank-node ID; triples whose object is an RDF literal yield Err(FormatError::LiteralAsNode) (callers may filter these out).
  • label — predicate IRI, transformed by LabelExtraction:
Variant Behaviour
LocalName (default) Fragment (#name) or last path segment of the predicate IRI
FullIri Full predicate IRI string

Constructors:

FFI layer

lagraph_sys exposes raw C bindings for GraphBLAS and LAGraph. Safe Rust wrappers live in graph::mod:

Macros (src/utils.rs)

Two #[macro_export] macros handle FFI error mapping:

  • grb_ok!(expr) — evaluates a GraphBLAS call inside unsafe, maps the i32 return to Result<(), GraphError::GraphBlas(info)>.
  • la_ok!(fn::path(args…)) — evaluates a LAGraph call, automatically appending the required *mut i8 message buffer, and maps failure to GraphError::LAGraph(info, msg).

Coding Conventions

  • Rust edition 2024.
  • Error handling via thiserror derive macros; two main error enums: GraphError and FormatError.
  • FormatError converts into GraphError via #[from] FormatError on the GraphError::Format variant.
  • Unsafe FFI calls are confined to lagraph_sys, graph/mod.rs, and graph/inmemory.rs. All raw pointers are wrapped in RAII types that free resources on drop.
  • unsafe impl Send + Sync is provided for LagraphGraph and GraphblasVector because GraphBLAS handles are thread-safe after init.
  • Unit tests live in #[cfg(test)] mod tests blocks inside each module. Integration tests that need GraphBLAS live in tests/inmemory_tests.rs.

Testing

# Run all tests (LAGraph installed system-wide)
LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose

# If LAGraph is NOT installed system-wide (only built in the submodule):
LD_LIBRARY_PATH=deps/LAGraph/build/src:deps/LAGraph/build/experimental:/usr/local/lib cargo test --verbose

Tests in src/graph/mod.rs use CountingBuilder / CountOutput / VecSource from src/utils.rs — these do not call into GraphBLAS and run without native libraries.

Tests in src/formats/csv.rs and src/formats/nt.rs are pure Rust and need no native dependencies.

Tests in src/graph/inmemory.rs and tests/inmemory_tests.rs call real GraphBLAS/LAGraph and require the native libraries to be present.

CI

The GitHub Actions workflow (.github/workflows/ci.yml) runs on every push and PR across stable, beta, and nightly toolchains:

  1. Checks out with submodules: recursive.
  2. Installs cmake, libclang-dev, clang.
  3. Builds and installs SuiteSparse:GraphBLAS from source (sudo make install).
  4. Builds and installs LAGraph from the submodule (sudo make install).
  5. cargo build --features regenerate-bindings — rebuilds FFI bindings.
  6. LD_LIBRARY_PATH=/usr/local/lib cargo test --verbose — runs the full test suite.