This document captures what didn't work during Causantic's development and why.
Decay edge weights based on elapsed wall-clock time:
weight = initialWeight * Math.exp(-decayRate * elapsedMs);All historical edges appeared "dead" regardless of relevance:
- Sessions days apart had zero weight
- Returning to a project showed no memory
- Active work was indistinguishable from ancient history
Wall-clock time doesn't reflect semantic distance:
- Monday's work and Tuesday's continuation are logically adjacent
- But 24 hours of elapsed time makes them appear distant
- Project switches create artificial time gaps
Hop-based distance (traversal depth / turn count difference) instead of wall-clock time. Logical steps matter, not minutes elapsed.
Same decay function for backward and forward edges:
weight = 1 - hops / 15; // Linear, same for both directionsEither backward or forward retrieval suffered:
- Short decay: Good for backward, poor for forward
- Long decay: Good for forward, poor for backward
- Medium decay: Mediocre for both
Backward and forward edges have different semantics:
- Backward: "What led to this?" - Relevance fades quickly
- Forward: "What resulted from this?" - Consequences persist longer
Direction-specific decay:
- Backward: Linear, dies at 10 hops
- Forward: Delayed linear, 5-hop hold, dies at 20 hops
Use hdbscan-ts (npm package) for clustering:
import { HDBSCAN } from 'hdbscan-ts';
const clusters = new HDBSCAN({ minClusterSize: 4 }).fit(embeddings);Clustering 6,000 points took 65+ minutes, making it impractical.
O(n² × k) complexity due to Array.includes():
// Problematic code in hdbscan-ts
for (const point of points) {
for (const cluster of clusters) {
if (cluster.includes(point)) {
// O(n) lookup
// ...
}
}
}Array.includes() is O(n), nested in O(n × k) loops.
Native TypeScript implementation with proper data structures:
Set.has()instead ofArray.includes()for O(1) lookups- Quickselect for O(n) k-th nearest neighbor
- Union-Find with path compression
- Parallel core distance computation
Result: 30 seconds instead of 65+ minutes for 6,000 points.
See HDBSCAN Implementation for details.
Weight adjacent (sequential) edges highest:
const edgeWeights = {
adjacent: 1.0,
filePath: 0.7,
topic: 0.5,
};Retrieval precision dropped significantly.
Adjacent chunks are often just "next" without semantic relationship:
- "Let me commit this" followed by "Now let's work on X"
- Adjacent in time, unrelated in topic
- High weight pollutes results with noise
The initial fix was file-path edges as primary signal (v0.2). The deeper fix in v0.3 was to eliminate all semantic edge types entirely — including adjacent edges. The causal graph now uses purely structural m×n all-pairs edges at D-T-D turn boundaries with topic-shift gating. Semantic association is handled by vector search and clustering instead.
Assign each chunk to nearest cluster centroid:
function assignCluster(chunk: Chunk): Cluster {
return clusters.reduce((best, cluster) =>
distance(chunk, cluster) < distance(chunk, best) ? cluster : best,
);
}Clusters grew uncontrollably, mixing unrelated topics.
No distance threshold means everything gets assigned:
- Distant outliers join nearest cluster
- Cluster purity degrades
- Topic mixing
Threshold-based assignment:
function assignCluster(chunk: Chunk): Cluster | null {
const nearest = findNearest(chunk, clusters);
return distance(chunk, nearest) < 0.09 ? nearest : null;
}Chunks beyond threshold remain unclustered (noise).
These questions were identified as open before implementation and resolved through experiments:
| Question | Answer | Evidence |
|---|---|---|
| Topic continuity detection | Lexical features (0.998 AUC), 30-min time gap threshold | Topic continuity experiment |
| Embedding model selection | jina-small (0.715 AUC, 0.384 silhouette) | Embedding benchmark |
| Decay curve type | Delayed linear for retrieval, exponential for prediction | Edge decay experiments |
| Directional asymmetry | Yes — +0.64 MRR delta for delayed linear | Forward prediction experiment |
| Thinking block handling | Remove before embedding (+0.063 AUC) | Ablation study |
| Chunk strategy | Turn-based, code-block aware | Parser implementation |
| Cold start problem | Not real — full context until compaction | Design analysis |
| Parallelism detection | Via parentToolUseID + timestamps | Session data inspection |
Sum-product traversal (inspired by Feynman path integrals) to walk the causal graph. Edge weights multiplied along paths, summed across paths to a node. Direction-specific hop decay curves controlled attenuation.
Collection benchmarks showed graph traversal contributed only ~2% of retrieval results. Augmentation ratio was 1.1× — barely above vector/keyword search alone.
Product chains converge to zero too fast. With edge weights in (0,1], a 5-hop path through 0.8-weighted edges yields 0.8⁵ = 0.33. By 10 hops: 0.11. By 15: 0.04. Vector search seeds already have cosine similarity ~0.6-0.8 — path products can't compete. The sum-product mechanism was theoretically elegant but practically dominated by direct vector/keyword hits.
Chain walking replaces graph traversal. Instead of multiplicative path products, the chain walker follows sequential edges and scores each hop independently via cosine similarity against the query. This means a relevant chunk 10 hops away scores just as highly as a relevant chunk 1 hop away — traversal depth doesn't attenuate the signal.
Create m×n all-pairs edges at each D-T-D turn boundary. Maximum entropy principle: don't impose false structure, let traversal and decay do the ranking.
Edge counts exploded. A turn boundary with 5 chunks on each side creates 25 edges. Real sessions with 10-20 chunks per turn created hundreds of edges per transition. Most edges connected semantically unrelated chunks.
The max-entropy principle is sound in theory — don't assume which edges are important. But in practice, it creates a dense graph where the signal (real causal connections) is buried under noise (spurious edges between unrelated chunks in the same turn). The traversal mechanism couldn't discriminate because all within-chain edges had weight 1.0.
Sequential 1-to-1 edges. Each chunk links to the next chunk in its session, preserving temporal order without the quadratic blowup. Cross-session edges link the last chunk of one session to the first of the next. This creates a simple linked list that chain walking can follow efficiently.
The graph's value is structural ordering — what came before and after — not semantic ranking. Vector search and BM25 are better at "what's relevant to this query." The graph is better at "given something relevant, what's the surrounding narrative?"
This separation of concerns led to the current architecture:
- Semantic discovery: Hybrid BM25 + vector search (fast, accurate, query-driven)
- Structural context: Chain walking along sequential edges (episodic, narrative, seed-driven)
- Topic grouping: HDBSCAN clustering (browsing, organization)
Each mechanism does what it's best at. The v0.2 architecture tried to make the graph do semantic ranking via sum-product path weights — conflating structural and semantic concerns.
Use cluster-level transition matrices (bigram/trigram) from the causal graph to predict which clusters should be returned at retrieval time. The hypothesis: if session A ended in clusters X,Y and session B started in clusters Y,Z, a transition matrix could learn X→Z and Y→Z patterns useful for prediction.
A preliminary scan over the full graph showed 61x lift (45% bigram accuracy vs 0.74% random), suggesting strong signal. We designed a controlled experiment isolating signal at actual query boundaries:
- Experiment A: Cross-session prediction — at each cross-session edge, predict the next session's initial clusters from the previous session's final clusters.
- Experiment B: Retrieval feedback chain — predict which clusters will be retrieved next based on recent retrieval history.
- Baselines: Random, most-popular, recency (predict same clusters), plus global/within-chain/cross-session bigrams, project-conditioned bigram, and trigram.
The preliminary 61x lift was entirely within-session workflow signal:
| Approach | P@5 | Lift@5 |
|---|---|---|
| Random | 3.7% | 1.0x |
| Most popular | 8.4% | 2.3x |
| Recency | 22.1% | 6.0x |
| Global bigram | 31.6% | 8.5x |
| Within-chain bigram | 31.6% | 8.5x |
| Cross-session bigram | 4.2% | 1.1x |
The cross-session bigram matrix contained only 3 source clusters and 3 cells — too sparse to learn anything. The global bigram's 8.5x lift was identical to the within-chain bigram, confirming it was entirely driven by within-chain edges (74.7% of forward edges).
Two compounding problems:
-
Sparsity at boundaries: Cross-session edges are only 4.2% of forward edges. The transition matrix has insufficient data to learn meaningful patterns at actual query boundaries.
-
Recency is tautological: Recency (6.0x lift) is the strongest viable baseline — but it's useless in practice because if you're querying memory, you already have the recent context. Returning the same clusters is circular.
Transition matrices do not provide useful signal at query boundaries. The approach works within sessions (where sequential chunks naturally revisit the same topics) but this is not where retrieval help is needed. At actual query boundaries — where retrieval would add value — the signal is too sparse to learn from.
This confirms the architectural separation: the graph's value is structural ordering (chain walking), not predictive ranking (transition matrices). Semantic discovery remains the job of vector search and BM25.
Script: scripts/experiments/transition-boundary-experiment.ts
- Question assumptions: Wall-clock time seems natural but is wrong
- Test direction-specific behavior: Backward ≠ forward
- Profile at scale: 100 points ≠ 6,000 points
- Semantic over temporal: Meaning matters more than sequence
- Allow noise: Not everything belongs in a cluster
- Measure before theorizing: Sum-product traversal was theoretically elegant but contributed 2% of results
- Separate concerns: The graph's value is structural ordering, not semantic ranking
- Simple beats complex: 1-to-1 sequential edges outperform m×n all-pairs with sum-product traversal
- Distinguish within-session from cross-session signal: A metric that looks great on the full graph may be entirely driven by trivial within-session patterns that don't help at retrieval time