Skip to content

Commit a6d8a25

Browse files
WIP
1 parent 3ebac97 commit a6d8a25

82 files changed

Lines changed: 9933 additions & 10964 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ public
1818
.lsp
1919
*.log
2020
verification/**
21+
.clj-kondo

.serena/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
/cache
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Code Style and Conventions
2+
3+
## Core Development Rules
4+
5+
### Code Quality Requirements
6+
- **Type hints required** for all code
7+
- **Public APIs must have docstrings**
8+
- **Functions must be focused and small**
9+
- **Follow existing patterns exactly**
10+
- **Line length: 120 chars maximum**
11+
- **Avoid `Any` type** - prefer typed classes inheriting from Pydantic `BaseModel`
12+
13+
### File Organization
14+
- Every module should have files ending with `Core` and `internal` folder
15+
- **MAKE ALL properties in class private** by prepending `_` to variable name
16+
- **IF property should be public** then hide it using `_` and create a property function
17+
- **AVOID MAGIC NUMBERS** - instead create STATIC fields in class
18+
19+
### Comment Style
20+
- **NO inline comments after code** - Never write comments on the same line as code (e.g., `x = 5 # this is bad`)
21+
- Comments should be on their own line above the code they describe
22+
- Docstrings are preferred over comments for function/class documentation
23+
24+
### Class Design
25+
- **Classes with only static methods should not have instances created**
26+
- Mark such classes appropriately (e.g., with abstract base class or clear documentation)
27+
- Consider using module-level functions instead if appropriate
28+
29+
### Forbidden Practices
30+
- Using `hasattr` and `getattr` by default
31+
- Use static types instead of dictionaries whenever possible and sensible
32+
33+
### Testing Requirements
34+
- **Framework**: pytest with anyio (preferred over asyncio)
35+
- **Test file naming**: Must end in `Test` postfix (e.g., `KnowledgeSearchCoreTest.py`)
36+
- **Test classes**: Always use classes in test files
37+
- **One test file per implementation file**
38+
39+
### Test Quality Standards - NO WEAK TESTS!
40+
Tests MUST:
41+
- **Not have any `if` statements**
42+
- **Test real values and not only shape**
43+
- **Not use `try`/`catch` for testing** (prevents false positives)
44+
- **Have hardcoded values mostly** and not ranges like `len(expression) > magic_number`
45+
- **NO MAGIC NUMBERS** - put numbers in class constants
46+
47+
### Exception Handling
48+
- **Always use `logger.exception()` instead of `logger.error()` when catching exceptions**
49+
- Don't include the exception in the message: `logger.exception("Failed")` not `logger.exception(f"Failed: {e}")`
50+
- **Catch specific exceptions** where possible:
51+
- File ops: `except (OSError, PermissionError):`
52+
- JSON: `except json.JSONDecodeError:`
53+
- Network: `except (ConnectionError, TimeoutError):`
54+
- **Only catch `Exception` for**:
55+
- Top-level handlers that must not crash
56+
- Cleanup blocks (log at debug level)
57+
58+
### Formatting Tools
59+
- **Ruff**: Primary linter and formatter
60+
- **Black**: Code formatting (line-length=120, target py312)
61+
- **isort**: Import sorting (black profile, line_length=120)
62+
- **MyPy**: Type checking with strict settings
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Development Workflow
2+
3+
## Package Management - CRITICAL RULES
4+
5+
### ✅ ALWAYS Use uv
6+
```bash
7+
uv add package # Add new dependency
8+
uv add --dev package # Add dev dependency
9+
uv run tool # Run any command
10+
uv run python3 script.py # Run Python scripts
11+
```
12+
13+
### ❌ FORBIDDEN Commands
14+
```bash
15+
uv pip install # NEVER use this
16+
pip install # NEVER use this
17+
uv add package@latest # NEVER use @latest syntax
18+
```
19+
20+
## Development Process
21+
22+
### 1. Initial Setup
23+
```bash
24+
# Clone and install
25+
git clone <repo>
26+
uv sync # Install dependencies
27+
```
28+
29+
### 2. Daily Development Cycle
30+
```bash
31+
# Before starting work
32+
poe format # Format code
33+
poe verify # Check types and linting
34+
35+
# After making changes
36+
poe test # Run tests
37+
poe check # Full verification
38+
```
39+
40+
### 3. Adding New Features
41+
1. **Check existing libraries first** - look at neighboring files, pyproject.toml
42+
2. **Follow existing patterns** - examine similar components
43+
3. **Use proper naming conventions** - `{Module}Core.py`, `{Module}Test.py`
44+
4. **Add comprehensive tests** - one test file per implementation file
45+
46+
### 4. Error Resolution Order
47+
1. **Format first**: `poe format`
48+
2. **Fix type errors**: Add type hints, None checks
49+
3. **Fix linting**: Remove unused imports, etc.
50+
51+
## Testing Guidelines
52+
53+
### Test Structure
54+
- Use classes for all tests
55+
- Test file names end with `Test`
56+
- One test file per implementation file
57+
- Use anyio for async tests (not asyncio)
58+
59+
### Test Quality (NO WEAK TESTS!)
60+
- No `if` statements in tests
61+
- Test actual values, not just shapes
62+
- No try/catch for testing
63+
- Use hardcoded expected values
64+
- No magic numbers - use class constants
65+
66+
### Running Tests
67+
```bash
68+
poe test # All tests
69+
poe test-cov # With coverage report
70+
poe test-cov-check # Enforce 85% coverage
71+
```
72+
73+
## Git Workflow
74+
- Check `git status` before commits
75+
- Never commit unless explicitly asked
76+
- Run `poe check` before any commits
77+
- Keep changes focused and minimal
78+
79+
## Performance Expectations
80+
- Initialization: < 0.5 seconds
81+
- Search operations: < 0.5 seconds
82+
- Pickle load/save: < 0.5 seconds
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Catalyst Project Overview
2+
3+
## Purpose
4+
Catalyst is an enterprise-grade toolkit for building LLM-powered document processing applications. It turns complex documents into queryable knowledge systems designed specifically for regulated industries where accuracy and source attribution are critical.
5+
6+
## Key Features
7+
- **Hybrid Intelligence**: Combines vector search + keyword extraction + relationship mapping
8+
- **Knowledge Graphs**: Extracts acronyms and builds relationships between terms
9+
- **Structure Preservation**: Tables, images, and document hierarchy stay intact
10+
- **Source Attribution**: Every answer includes exact page numbers and document sections
11+
- **Zero Dependencies**: Fully self-contained with embedded models (~32MB)
12+
- **Offline Capable**: No external API calls required
13+
- **Async Processing**: Built on ASGI for high-performance document pipelines
14+
15+
## Target Use Cases
16+
- Regulatory compliance document analysis
17+
- Legal due diligence across contracts
18+
- Financial analysis with risk metrics
19+
- Technical documentation understanding
20+
- Audit support with complete evidence chains
21+
22+
## Tech Stack
23+
- **Language**: Python 3.12+
24+
- **Package Management**: uv (NOT pip)
25+
- **Core Dependencies**:
26+
- langchain-core>=0.3.75
27+
- model2vec>=0.6.0 (embedded model)
28+
- numpy, tenacity, trio, sqlalchemy
29+
- **Optional Dependencies**:
30+
- API: fastapi, fastmcp, agno, rapidfuzz
31+
- Extraction: pdfplumber, scikit-learn, pypdf, pyoxipng
32+
- **Testing**: pytest, anyio (preferred over asyncio)
33+
- **Development**: mypy, ruff, black, isort, poethepoet
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Project Structure
2+
3+
## Main Directories
4+
5+
```
6+
com_blockether_catalyst/
7+
├── src/com_blockether_catalyst/ # Main source code
8+
│ ├── asgi/ # ASGI web application modules
9+
│ ├── consensus/ # LLM consensus and voting systems
10+
│ ├── encoder/ # Text encoding and embeddings
11+
│ ├── knowledge/ # Core knowledge extraction & search
12+
│ ├── prompt/ # Prompt engineering and alignment
13+
│ ├── integrations/ # External system integrations
14+
│ │ └── agno/ # Agno workflow engine integration
15+
│ ├── utils/ # Shared utilities
16+
│ └── assets/ # Static assets (models, etc.)
17+
├── tests/ # Test suite (mirrors src structure)
18+
├── tools/ # Development and utility scripts
19+
├── docs/ # Documentation
20+
└── verification/ # Deployment verification scripts (Python only)
21+
```
22+
23+
## Key Modules
24+
25+
### Knowledge System (`knowledge/`)
26+
- **KnowledgeSearchCore.py**: Main search engine with hybrid vector+keyword search
27+
- **KnowledgeExtractionCore.py**: Document processing and term extraction
28+
- **PDFKnowledgeExtractor.py**: PDF-specific extraction logic
29+
- **KnowledgeVisualizationASGIModule.py**: Web UI for knowledge exploration
30+
31+
### Consensus System (`consensus/`)
32+
- **Consensus.py**: Multi-LLM consensus mechanism
33+
- **ConsensusCore.py**: Core consensus algorithms
34+
- **VotingComparison.py**: Voting strategies for LLM outputs
35+
36+
### Prompt System (`prompt/`)
37+
- **PromptAlignmentCore.py**: Prompt optimization and alignment
38+
- **PrincipleBasedAlignmentStrategy.py**: Alignment strategies
39+
40+
### Utils (`utils/`)
41+
- **TypedCalls.py**: Type-safe LLM API calls
42+
- **ConcurrentProcessor.py**: Async processing utilities
43+
- **instructor/**: Structured LLM output handling
44+
45+
## Naming Conventions
46+
- Core implementation files end with `Core` (e.g., `KnowledgeSearchCore.py`)
47+
- Test files end with `Test` (e.g., `KnowledgeSearchCoreTest.py`)
48+
- Type definition files end with `Types` (e.g., `KnowledgeExtractionTypes.py`)
49+
- ASGI modules end with `ASGIModule` for web components
50+
51+
## File Organization Pattern
52+
Each major module typically has:
53+
- `{Module}Core.py` - Main implementation
54+
- `{Module}Types.py` - Type definitions and data models
55+
- `{Module}Test.py` - Comprehensive test suite
56+
- `internal/` subfolder for implementation details (where applicable)
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Essential Development Commands
2+
3+
## Package Management (ALWAYS use uv)
4+
```bash
5+
# Installation
6+
uv add package
7+
8+
# Running tools
9+
uv run tool
10+
11+
# Running examples
12+
uv run python3 examples/*
13+
14+
# Running verification scripts
15+
uv run python3 verification/*
16+
17+
# Upgrading packages
18+
uv add --dev package --upgrade-package package
19+
20+
# FORBIDDEN: uv pip install, @latest syntax
21+
```
22+
23+
## Code Quality & Testing
24+
```bash
25+
# Linting and type checking
26+
poe verify # Runs both lint and typecheck
27+
poe lint # uv run ruff check src/ tests/
28+
poe typecheck # uv run mypy src/ tests/
29+
30+
# Formatting
31+
poe format # Runs ruff, black, and isort formatters
32+
poe format-ruff # uv run ruff format src/ tests/
33+
poe format-black # uv run black src/ tests/
34+
poe format-isort # uv run isort src/ tests/
35+
36+
# Testing
37+
poe test # uv run python3 -m pytest
38+
poe test-cov # uv run python3 -m pytest --cov=src --cov-report=html
39+
poe test-cov-check # uv run python3 -m pytest --cov=src --cov-report=term-missing --cov-fail-under=85
40+
41+
# Complete workflow
42+
poe check # format + verify + test-cov-check + check-docs
43+
```
44+
45+
## Documentation
46+
```bash
47+
poe docs-serve # uv run mkdocs serve
48+
poe docs-build # uv run mkdocs build
49+
poe check-docs # Validates documentation structure
50+
```
51+
52+
## Cleaning
53+
```bash
54+
poe clean # Cleans pyc files and cache
55+
poe clean-cache # uv cache clean
56+
poe clean-pyc # Removes __pycache__ and .pyc files
57+
```
58+
59+
## System Commands (macOS Darwin)
60+
```bash
61+
# Standard Unix commands available:
62+
ls, cd, grep, find, git, etc.
63+
```
64+
65+
## Important Notes
66+
- ALWAYS run `poe format` before `poe verify`
67+
- If pytest fails with asyncio marks, try: `PYTEST_DISABLE_PLUGIN_AUTOLOAD="" uv run --frozen pytest`
68+
- NO shell scripts (.sh) allowed in verification/ directory - use Python scripts only
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Task Completion Checklist
2+
3+
When completing any development task, follow this exact order:
4+
5+
## 1. Code Quality Checks (REQUIRED)
6+
```bash
7+
# Always run in this order:
8+
poe format # Format code first
9+
poe verify # Then lint and typecheck
10+
```
11+
12+
## 2. Testing (REQUIRED)
13+
```bash
14+
poe test # Run all tests
15+
# OR for coverage check:
16+
poe test-cov-check # Ensures >=85% coverage
17+
```
18+
19+
## 3. Full Verification (RECOMMENDED)
20+
```bash
21+
poe check # Runs format + verify + test-cov-check + check-docs
22+
```
23+
24+
## Error Resolution Priority
25+
1. **Formatting issues first** (line length, imports, etc.)
26+
2. **Type errors second** (add type hints, None checks, function signatures)
27+
3. **Linting issues last** (unused imports, etc.)
28+
29+
## Common Fixes
30+
- **Line length**: Break strings with parentheses, multi-line function calls, split imports
31+
- **Type errors**: Add None checks, narrow string types, match existing patterns
32+
- **Missing dependencies**: Check package.json/pyproject.toml for available libraries
33+
34+
## Pre-commit Requirements
35+
- Check git status before commits
36+
- Ensure all linting/typing passes
37+
- Keep changes minimal and focused
38+
- Document public APIs
39+
- Test thoroughly, especially edge cases and error conditions
40+
41+
## NEVER commit changes unless explicitly asked
42+
- Only commit when user explicitly requests it
43+
- Run all quality checks before committing

0 commit comments

Comments
 (0)