Skip to content

Conversation

@jja725
Copy link
Contributor

@jja725 jja725 commented Jan 22, 2026

Summary

Implements #16 - Background compaction for Lance fragments with comprehensive features:

  • ✅ Manual compaction via compact() method
  • ✅ Optional background compaction with configurable intervals
  • ✅ Comprehensive configuration (thresholds, quiet hours, intervals)
  • ✅ Advanced observability (stats API, metrics, structured logging)

Changes

Rust Core:

  • Added CompactionConfig and CompactionStats types
  • Implemented compact(), should_compact(), compaction_stats() methods
  • Background compaction task with Tokio interval timer
  • Graceful shutdown via Drop implementation
  • Added tokio and tracing dependencies

Python API:

  • PyO3 bindings for all compaction methods
  • High-level API with comprehensive docstrings
  • Configuration parameters in Context.create()

Tests:

  • 10 comprehensive integration tests (all passing ✅)
  • Tests cover manual/background compaction, quiet hours, data integrity

Usage Example

Manual Compaction:
```python
ctx = Context.create("context.lance")
for i in range(100):
ctx.add("user", f"message {i}")

metrics = ctx.compact()
print(f"Removed {metrics['fragments_removed']} fragments")
```

Background Compaction:
```python
ctx = Context.create(
"context.lance",
enable_background_compaction=True,
compaction_interval_secs=300,
compaction_min_fragments=10,
quiet_hours=[(22, 6)], # 10pm-6am
)
```

Check Status:
```python
stats = ctx.compaction_stats()
print(f"Fragments: {stats['total_fragments']}")
print(f"Last compaction: {stats['last_compaction']}")
```

Test Results

```
10 passed in 5.39s
✅ Manual compaction reduces fragments
✅ Data integrity preserved
✅ Concurrent writes work
✅ Compaction stats accurate
✅ Custom options work
✅ Background compaction triggers
✅ Quiet hours respected
✅ Metrics structure correct
✅ Empty context handled
✅ Multiple compactions work
```

Architecture

  • Hybrid approach: Both manual and optional background compaction
  • Thread-safe: Uses Arc for state management
  • Non-blocking: Background task runs in separate Tokio task
  • Graceful shutdown: Drop implementation aborts background task
  • Lance MVCC: No explicit locking needed, leverages Lance's versioning

Checklist

  • Tests added and passing
  • Documentation updated
  • Code follows project conventions
  • All files properly formatted

Copy link
Collaborator

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but could you fix the ci?

jja725 and others added 3 commits January 27, 2026 21:57
Implements issue lance-format#16 with comprehensive compaction functionality:

**Core Features:**
- Manual compaction via `compact()` method
- Optional background compaction with configurable intervals
- Comprehensive configuration (thresholds, quiet hours, intervals)
- Advanced observability (stats API, metrics, logging)

**Implementation Details:**
- Rust: Added CompactionConfig, CompactionStats types to store.rs
- Rust: Implemented compact(), should_compact(), compaction_stats()
- Rust: Background task with Tokio interval timer and graceful shutdown
- Python: PyO3 bindings for all compaction methods
- Python: High-level API with full docstrings
- Tests: 10 comprehensive tests (all passing)

**Configuration Options:**
- enable_background_compaction: Enable auto-compaction
- compaction_interval_secs: Check interval (default: 300s)
- compaction_min_fragments: Trigger threshold (default: 5)
- compaction_target_rows: Target rows per fragment (default: 1M)
- quiet_hours: Skip compaction during specified hours

**Metrics Returned:**
- fragments_removed/added
- files_removed/added
- is_compacting status
- last_compaction timestamp
- total_compactions count

All tests pass. Documentation updated with usage examples.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Addresses lance-format#21 - Release the Global Interpreter Lock during all
blocking operations to allow Python threads to run concurrently.

**Changes:**
- Wrapped all `runtime.block_on()` calls in `py.allow_threads()`
- Applies to: create(), add(), compact(), compaction_stats(),
  checkout(), search(), list()

**Benefits:**
- Python interpreter no longer freezes during operations
- Background threads (heartbeats, UI) remain responsive
- Critical for S3-backed stores (50-500ms+ latency)
- Critical for long-running compaction operations

**Pattern:**
```rust
py.allow_threads(|| {
    self.runtime
        .block_on(async_operation())
        .map_err(to_py_err)
})?
```

This ensures concurrent Python execution while Rust performs
expensive I/O and computation.

All tests pass (19 passed, 2 skipped).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@jja725 jja725 force-pushed the feat/background-compaction branch from 321b21c to 83d56e7 Compare January 28, 2026 06:00
@beinan beinan merged commit 77dad8a into lance-format:main Jan 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants