MERGE Operation Design

Overview

This document specifies the design for Cypher-like MERGE operations in sqlite-graph, enabling idempotent graph operations similar to Neo4j's MERGE clause.

Motivation

Problems Solved

Duplicate Prevention: Prevent creating duplicate nodes/edges when running import scripts multiple times
Idempotency: Allow safe retry logic in distributed systems and ETL pipelines
Upsert Pattern: Simplify "create if missing, update if exists" logic
Data Import: Handle external data sources where existence is unknown

Use Cases

// Job scraping - run daily without duplicates
db.mergeNode('Job',
  { url: 'https://example.com/job/123' },  // Match on URL
  { title: 'Senior Engineer', status: 'active' },  // Create with these
  { lastSeen: Date.now() }  // Update on match
);

// Company deduplication
db.mergeNode('Company',
  { name: 'TechCorp' },  // Match criteria
  { name: 'TechCorp', founded: 2020 },  // ON CREATE
  { lastUpdated: Date.now() }  // ON MATCH
);

// Unique relationships
db.mergeEdge(jobId, 'POSTED_BY', companyId);  // Only create once

API Design

Node Merge

interface MergeOptions<T extends NodeData = NodeData> {
  onCreate?: Partial<T>;  // Properties to set only on creation
  onMatch?: Partial<T>;   // Properties to set only on match
}

mergeNode<T extends NodeData = NodeData>(
  type: string,
  matchProperties: Partial<T>,
  baseProperties?: T,
  options?: MergeOptions<T>
): Node<T>

Parameters:

type: Node type (e.g., 'Job', 'Company')
matchProperties: Properties to match on (lookup criteria)
baseProperties: Properties for creation (merged with matchProperties)
options.onCreate: Additional properties set only on CREATE
options.onMatch: Additional properties set only on MATCH

Returns: The matched or created node

Behavior:

Search for node matching type AND all matchProperties
If found (MATCH):
- Merge onMatch properties with existing
- Update updated_at timestamp
- Return existing node
If not found (CREATE):
- Merge matchProperties, baseProperties, and onCreate
- Set created_at and updated_at timestamps
- Return new node

Edge Merge

interface EdgeMergeOptions<T extends NodeData = NodeData> {
  onCreate?: Partial<T>;
  onMatch?: Partial<T>;
}

mergeEdge<T extends NodeData = NodeData>(
  from: number,
  type: string,
  to: number,
  properties?: T,
  options?: EdgeMergeOptions<T>
): Edge<T>

Parameters:

from: Source node ID
type: Edge type
to: Target node ID
properties: Base edge properties
options.onCreate: Properties set only on creation
options.onMatch: Properties set only on match

Returns: The matched or created edge

Behavior:

Search for edge matching from, type, AND to
If found: merge onMatch properties, return existing
If not found: create with properties + onCreate, return new

Index Requirements

Critical for Performance

MERGE operations REQUIRE indexes on match properties to avoid full table scans:

// BEFORE using mergeNode with URL matching
db.createPropertyIndex('Job', 'url');  // Required!

// Then safe to merge
db.mergeNode('Job', { url: 'https://example.com/job/123' }, ...);

Index Management API

// Create JSON property index for fast lookups
createPropertyIndex(
  nodeType: string,
  property: string,
  unique?: boolean
): void

// List all custom indexes
listIndexes(): Array<{
  name: string;
  table: string;
  columns: string[];
  unique: boolean;
}>

// Drop a custom index
dropIndex(indexName: string): void

Index Names: Generated as idx_merge_${nodeType}_${property}

Warning System

// If no index exists, log warning:
console.warn(
  `⚠️  MERGE on Job.url without index. Performance will degrade on large datasets.
   Run: db.createPropertyIndex('Job', 'url') to fix.`
);

Implementation Strategy

SQL Approach: INSERT ... ON CONFLICT

SQLite 3.24+ supports INSERT ... ON CONFLICT which is atomic and efficient:

-- For nodes
INSERT INTO nodes (type, properties)
VALUES (?, ?)
ON CONFLICT(type, json_extract(properties, '$.url'))
DO UPDATE SET
  properties = json_patch(properties, ?),
  updated_at = strftime('%s', 'now')
RETURNING *;

Challenge: SQLite requires unique constraints/indexes on ON CONFLICT columns.

Alternative: SELECT + INSERT in Transaction

More compatible approach using transactions:

db.transaction(() => {
  // 1. Try to find existing
  const stmt = db.prepare(`
    SELECT * FROM nodes
    WHERE type = ?
    AND json_extract(properties, '$.url') = ?
    LIMIT 1
  `);
  const existing = stmt.get(type, matchProperties.url);

  if (existing) {
    // 2a. MATCH: Update with onMatch properties
    const merged = {
      ...deserialize(existing.properties),
      ...options?.onMatch
    };
    return updateNode(existing.id, merged);
  } else {
    // 2b. CREATE: Insert with onCreate properties
    const merged = {
      ...matchProperties,
      ...baseProperties,
      ...options?.onCreate
    };
    return createNode(type, merged);
  }
});

Advantages:

Works without schema changes
Uses existing transaction infrastructure
Clear semantics

Disadvantages:

Two queries (not atomic at SQL level)
Requires manual transaction wrapper

Schema Changes

Option 1: Virtual Columns (Recommended)

Add generated columns for commonly merged properties:

-- Migration to support merge indexes
ALTER TABLE nodes ADD COLUMN url_generated TEXT
  GENERATED ALWAYS AS (json_extract(properties, '$.url')) VIRTUAL;

CREATE UNIQUE INDEX idx_merge_job_url
  ON nodes(type, url_generated)
  WHERE type = 'Job';

Pros: Fast, efficient, enforces uniqueness Cons: Requires migration, schema complexity

Option 2: JSON Indexes (Simpler)

Use SQLite's JSON expression indexes:

CREATE UNIQUE INDEX idx_merge_job_url
  ON nodes(type, json_extract(properties, '$.url'))
  WHERE type = 'Job';

Pros: No schema changes, flexible Cons: May be slower than virtual columns

Error Handling

Merge Conflicts

class MergeConflictError extends Error {
  constructor(
    public nodeType: string,
    public matchProperties: NodeData,
    public conflictingNodes: Node[]
  ) {
    super(
      `Multiple nodes match merge criteria for ${nodeType}: ` +
      `${JSON.stringify(matchProperties)}. Found ${conflictingNodes.length} matches. ` +
      `Ensure match properties uniquely identify nodes or add uniqueness constraints.`
    );
  }
}

When thrown: Multiple nodes match the merge criteria

Resolution:

Add more specific match properties
Create unique index/constraint
Use getNode() to find and delete duplicates

Missing Index Warning

class MergePerformanceWarning extends Error {
  constructor(
    public nodeType: string,
    public property: string
  ) {
    super(
      `MERGE on ${nodeType}.${property} without index. ` +
      `This will cause full table scans. ` +
      `Create index with: db.createPropertyIndex('${nodeType}', '${property}')`
    );
  }
}

When thrown: Optional warning in development mode

Type Definitions

// Add to src/types/index.ts

export interface MergeOptions<T extends NodeData = NodeData> {
  onCreate?: Partial<T>;
  onMatch?: Partial<T>;
  warnOnMissingIndex?: boolean;  // Default: true in dev, false in prod
}

export interface EdgeMergeOptions<T extends NodeData = NodeData> {
  onCreate?: Partial<T>;
  onMatch?: Partial<T>;
}

export interface IndexInfo {
  name: string;
  table: 'nodes' | 'edges';
  columns: string[];
  unique: boolean;
  partial?: string;  // WHERE clause for partial indexes
}

export class MergeConflictError extends Error {
  constructor(
    public nodeType: string,
    public matchProperties: NodeData,
    public conflictingNodes: Node[]
  );
}

export class MergePerformanceWarning extends Error {
  constructor(
    public nodeType: string,
    public property: string
  );
}

Examples

Basic Merge

// Simple upsert
const company = db.mergeNode('Company',
  { name: 'TechCorp' },
  { name: 'TechCorp', industry: 'Software' }
);

WITH ON CREATE / ON MATCH

// Track creation vs update timestamps
const job = db.mergeNode('Job',
  { url: 'https://example.com/job/123' },  // Match on URL
  {
    url: 'https://example.com/job/123',
    title: 'Senior Engineer',
    status: 'active'
  },
  {
    onCreate: {
      discovered: new Date(),
      applicationStatus: 'not_applied'
    },
    onMatch: {
      lastSeen: new Date(),
      scrapedCount: db.getRawDb()
        .prepare('SELECT json_extract(properties, "$.scrapedCount") + 1 FROM nodes WHERE id = ?')
    }
  }
);

Relationship Merge

// Ensure unique relationship
db.mergeEdge(
  job.id,
  'POSTED_BY',
  company.id,
  { source: 'scraper' },
  {
    onCreate: { firstSeen: Date.now() },
    onMatch: { lastVerified: Date.now() }
  }
);

Bulk Import with Merge

// Safe daily job import
db.transaction(() => {
  for (const jobData of externalJobList) {
    const company = db.mergeNode('Company',
      { name: jobData.companyName },
      { name: jobData.companyName, url: jobData.companyUrl }
    );

    const job = db.mergeNode('Job',
      { url: jobData.url },
      {
        url: jobData.url,
        title: jobData.title,
        description: jobData.description
      },
      {
        onCreate: { discovered: Date.now() },
        onMatch: { lastSeen: Date.now() }
      }
    );

    db.mergeEdge(job.id, 'POSTED_BY', company.id);
  }
});

Testing Requirements

Unit Tests

Integration Tests

Bulk import with merge (idempotent)
Concurrent merge operations
Performance with/without indexes
Complex match criteria (multiple properties)
Schema validation with merge

Performance Tests

Merge with index vs without (100k nodes)
Bulk merge operations (10k+ records)
Compare to manual transaction pattern

Migration Guide

For Existing Users

// Before (manual pattern)
const existing = db.nodes('Company').where({ name }).limit(1).exec()[0];
const company = existing
  ? db.updateNode(existing.id, data)
  : db.createNode('Company', { name, ...data });

// After (merge)
const company = db.mergeNode('Company', { name }, { name, ...data });

Breaking Changes

None - MERGE is additive feature.

Deprecation Plan

Consider deprecating manual upsert pattern in favor of merge in documentation.

Future Enhancements

Multi-property Merge: Composite match criteria

db.mergeNode('Job', { url, companyId }, ...)

Pattern Merge: Full graph pattern matching

db.mergePattern([
  { type: 'Job', match: { url }, ... },
  { type: 'Company', match: { name }, ... },
  { edge: 'POSTED_BY', from: 0, to: 1 }
])

Conditional Merge: Only merge if condition met

db.mergeNode('Job', { url }, data, {
  condition: { status: 'active' }
})

Batch Merge: Optimize bulk operations

db.bulkMergeNodes('Job', jobList, { matchOn: 'url' })

References

Neo4j MERGE: https://neo4j.com/docs/cypher-manual/current/clauses/merge/
SQLite INSERT ON CONFLICT: https://www.sqlite.org/lang_UPSERT.html
SQLite JSON Functions: https://www.sqlite.org/json1.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MERGE Operation Design

Overview

Motivation

Problems Solved

Use Cases

API Design

Node Merge

Edge Merge

Index Requirements

Critical for Performance

Index Management API

Warning System

Implementation Strategy

SQL Approach: INSERT ... ON CONFLICT

Alternative: SELECT + INSERT in Transaction

Schema Changes

Option 1: Virtual Columns (Recommended)

Option 2: JSON Indexes (Simpler)

Error Handling

Merge Conflicts

Missing Index Warning

Type Definitions

Examples

Basic Merge

WITH ON CREATE / ON MATCH

Relationship Merge

Bulk Import with Merge

Testing Requirements

Unit Tests

Integration Tests

Performance Tests

Migration Guide

For Existing Users

Breaking Changes

Deprecation Plan

Future Enhancements

References

FilesExpand file tree

MERGE-DESIGN.md

Latest commit

History

MERGE-DESIGN.md

File metadata and controls

MERGE Operation Design

Overview

Motivation

Problems Solved

Use Cases

API Design

Node Merge

Edge Merge

Index Requirements

Critical for Performance

Index Management API

Warning System

Implementation Strategy

SQL Approach: INSERT ... ON CONFLICT

Alternative: SELECT + INSERT in Transaction

Schema Changes

Option 1: Virtual Columns (Recommended)

Option 2: JSON Indexes (Simpler)

Error Handling

Merge Conflicts

Missing Index Warning

Type Definitions

Examples

Basic Merge

WITH ON CREATE / ON MATCH

Relationship Merge

Bulk Import with Merge

Testing Requirements

Unit Tests

Integration Tests

Performance Tests

Migration Guide

For Existing Users

Breaking Changes

Deprecation Plan

Future Enhancements

References