Memory Issue

### Checks

- [x] I have searched existing issues and discussions
- [x] I can reproduce this with the latest `main` or release

### Describe the bug

Whenever I use a model, regardless of the size of model, each request (either through Osaurus built-in chat window or an external application like BoltAI) causes the model to load an additional time. I tested with Qwen3 4B 2507 (around 2gb) and after each question, Osaurus' memory usage would multiply in size. For larger models like qwen3 30b, Osaurus consumed 40gb+ ram after just two messages and MacOS started using 6gb+ swap. Hitting "unload model" or "clear all" did not unload the model, and the only way to unload models was to quit Osaurus.

### Steps to reproduce

_No response_

### Osaurus version / commit

v0.5.54

### macOS version

26.2

### Apple Silicon chip

M4 Pro 48gb

### Xcode version

26.2

### Logs

```shell

```

### Screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Issue #255

Checks

Describe the bug

Steps to reproduce

Osaurus version / commit

macOS version

Apple Silicon chip

Xcode version

Logs

Screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory Issue #255

Description

Checks

Describe the bug

Steps to reproduce

Osaurus version / commit

macOS version

Apple Silicon chip

Xcode version

Logs

Screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions