Commit 5aab006
fix: set OLLAMA_KEEP_ALIVE=-1 to prevent model eviction from RAM
On CPU-only servers, Ollama unloads models after 5 minutes of
inactivity. Combined with ~9 min inference times, this means the model
gets evicted between requests, causing repeated cold starts. Setting
KEEP_ALIVE=-1 keeps models loaded indefinitely.
Also clears the status_message after warm-up completes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent ffbb7a8 commit 5aab006
3 files changed
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
403 | 403 | | |
404 | 404 | | |
405 | 405 | | |
| 406 | + | |
406 | 407 | | |
407 | 408 | | |
408 | 409 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
103 | 109 | | |
104 | 110 | | |
105 | 111 | | |
| |||
0 commit comments