This document provides guidance on choosing and configuring embedding models for Memorizer.
Embedding models convert text into dense numerical vectors (embeddings) that capture semantic meaning. These vectors enable similarity search - finding memories that are conceptually related to a query, not just keyword matches.
Memorizer uses these embeddings for:
- Semantic search: Find memories by meaning, not just exact text
- Metadata search: Search by title, tags, and type combined
- Similarity ranking: Order results by relevance
Memorizer works with any embedding API that follows the Ollama embedding format:
POST /api/embeddings
{
"model": "model-name",
"prompt": "text to embed"
}
Ollama provides an easy way to run embedding models locally:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull an embedding model
ollama pull all-minilm
# Verify it's working
curl http://localhost:11434/api/embeddings -d '{
"model": "all-minilm",
"prompt": "test"
}'- LocalAI: Self-hosted, OpenAI-compatible API
- LM Studio: Desktop app with API server mode
- vLLM: High-performance inference server
| Model | Dimensions | Size | Speed | Quality | Best For |
|---|---|---|---|---|---|
all-minilm |
384 | 23MB | Fast | Good | General use, limited resources |
all-minilm:33m-l12-v2-fp16 |
384 | 66MB | Fast | Good | Better precision than base |
nomic-embed-text |
768 | 274MB | Medium | Better | Balanced quality/speed |
mxbai-embed-large |
1024 | 670MB | Slow | Best | Maximum quality |
bge-base-en-v1.5 |
768 | 438MB | Medium | Better | English text |
bge-large-en-v1.5 |
1024 | 1.3GB | Slow | Best | English text, high quality |
bge-m3 |
1024 | 1.2GB | Slow | Best | Multilingual |
snowflake-arctic-embed |
1024 | 1.1GB | Slow | Best | Retrieval tasks |
qwen3-embedding:0.6b |
1024 | 639MB | Fast | Better | Lightweight multilingual |
qwen3-embedding:4b |
2560 | 2.5GB | Medium | Best | High quality multilingual |
qwen3-embedding:8b |
4096 | 4.7GB | Slow | State-of-art | Maximum quality, code retrieval |
ollama pull all-minilm- Dimensions: 384
- Context: 256 tokens
- Strengths: Very fast, low memory, good general performance
- Weaknesses: Smaller context window, lower quality on complex queries
- Use when: Running on limited hardware, need fast responses, memories are short
ollama pull nomic-embed-text- Dimensions: 768
- Context: 8192 tokens
- Strengths: Long context, good quality, reasonable speed
- Weaknesses: Larger than minilm, requires more memory
- Use when: Memories contain longer documents, need better semantic understanding
ollama pull mxbai-embed-large- Dimensions: 1024
- Context: 512 tokens
- Strengths: High quality embeddings, good for retrieval
- Weaknesses: Slower, requires more storage for embeddings
- Use when: Quality is priority, have adequate compute resources
ollama pull bge-large-en-v1.5- Dimensions: 1024
- Context: 512 tokens
- Strengths: Excellent quality for English, well-benchmarked
- Weaknesses: English-only, large model size
- Use when: English content only, maximum retrieval quality needed
ollama pull bge-m3- Dimensions: 1024
- Context: 8192 tokens
- Strengths: Multilingual (100+ languages), long context
- Weaknesses: Slower, large model
- Use when: Multilingual content, long documents
The Qwen3 Embedding series offers state-of-the-art performance, ranking #1 on the MTEB multilingual leaderboard (score 70.58 as of June 2025).
# Lightweight option
ollama pull qwen3-embedding:0.6b
# Balanced option
ollama pull qwen3-embedding:4b
# Maximum quality
ollama pull qwen3-embedding:8bqwen3-embedding:0.6b
- Dimensions: Up to 1024 (configurable 32-1024)
- Context: 32K tokens
- Size: 639MB
- Use when: Need multilingual support with limited resources
qwen3-embedding:4b
- Dimensions: Up to 2560 (configurable 32-2560)
- Context: 40K tokens
- Size: 2.5GB
- Use when: Balanced quality and resource usage
qwen3-embedding:8b
- Dimensions: Up to 4096 (configurable 32-4096)
- Context: 40K tokens
- Size: 4.7GB
- Use when: Maximum quality, code retrieval, complex semantic search
Key strengths of Qwen3 Embedding:
- 100+ language support including programming languages
- Exceptional code retrieval performance
- Very long context windows (32K-40K tokens)
- Configurable output dimensions
- State-of-the-art benchmark scores
Configure your embedding model in appsettings.json:
{
"Embeddings": {
"ApiUrl": "http://localhost:11434",
"Model": "all-minilm",
"Timeout": "00:01:00"
}
}| Setting | Description | Default |
|---|---|---|
ApiUrl |
URL of the embedding API | http://localhost:11434 |
Model |
Model name to use | Required |
Timeout |
Request timeout | 00:01:00 |
You can also configure via environment variables:
export Embeddings__ApiUrl=http://localhost:11434
export Embeddings__Model=nomic-embed-text
export Embeddings__Timeout=00:02:00If you switch to a model with the same dimensions (e.g., all-minilm to another 384D model), no migration is needed. Just update the config and restart.
If the new model has different dimensions, you must run a migration:
- Update
Embeddings:Modelin configuration - Restart Memorizer - you'll see a warning banner
- Navigate to Tools > Dimension Migration
- Click Start Dimension Migration
- Wait for all embeddings to regenerate
See Embedding Migration for detailed migration instructions.
Migration time depends on:
- Number of memories
- Embedding model speed
- Hardware resources
Rough estimates for 1,000 memories:
| Model | Approximate Time |
|---|---|
all-minilm |
~2-5 minutes |
nomic-embed-text |
~5-10 minutes |
mxbai-embed-large |
~10-20 minutes |
Start
│
▼
Do you have GPU acceleration?
│
├─ No ──► all-minilm (384D) or qwen3-embedding:0.6b (1024D)
│
▼ Yes
│
Is state-of-the-art quality critical?
│
├─ Yes ──► qwen3-embedding:8b (4096D)
│
▼ No
│
Is multilingual support needed?
│
├─ Yes ──► qwen3-embedding:4b (2560D) or bge-m3 (1024D)
│
▼ No
│
Are your documents long (>500 words)?
│
├─ Yes ──► nomic-embed-text (768D)
│
▼ No
│
Is code retrieval important?
│
├─ Yes ──► qwen3-embedding:4b (2560D)
│
▼ No
│
nomic-embed-text (768D)
| Use Case | Recommended Model | Why |
|---|---|---|
| Personal notes | all-minilm |
Fast, works on any hardware |
| Code snippets | qwen3-embedding:4b |
Excellent code retrieval, programming language support |
| Documentation | nomic-embed-text |
Long context support, good balance |
| Research papers | bge-large-en-v1.5 |
High quality retrieval for English |
| Multilingual content | qwen3-embedding:4b |
100+ languages, state-of-the-art quality |
| Maximum accuracy | qwen3-embedding:8b |
#1 on MTEB leaderboard |
| Limited resources | qwen3-embedding:0.6b |
Good quality at small size |
Higher dimension models require more database storage:
| Dimensions | Storage per Memory | 10,000 Memories |
|---|---|---|
| 384 | ~1.5 KB | ~15 MB |
| 768 | ~3 KB | ~30 MB |
| 1024 | ~4 KB | ~40 MB |
| 2560 | ~10 KB | ~100 MB |
| 4096 | ~16 KB | ~160 MB |
Note: These are estimates for the embedding vectors only. Actual storage includes text content, metadata embeddings, and indexes.
- Use GPU acceleration if available
- Choose a smaller model (
all-minilm) - Increase timeout for batch operations
- Run Ollama with
OLLAMA_NUM_PARALLEL=4for concurrent requests
- Use a larger model (
mxbai-embed-large,bge-large) - Ensure model context covers your typical memory length
- Use descriptive titles and tags (they're embedded separately)
# List available models
ollama list
# Pull the model you need
ollama pull model-nameIncrease the timeout in configuration:
{
"Embeddings": {
"Timeout": "00:05:00"
}
}You've changed to a model with different output dimensions. See Embedding Migration.
- Check if Ollama is using GPU:
ollama ps - Try a smaller model
- Ensure adequate system memory
- Ollama Models Library
- MTEB Leaderboard - Embedding model benchmarks
- Sentence Transformers - Model documentation
- pgvector - Vector storage in PostgreSQL