[DRAFT] Replacing WEB UI for Ollama Backend.#116
[DRAFT] Replacing WEB UI for Ollama Backend.#116RSDNTWK wants to merge 4 commits intoRubiksman78:mainfrom
Conversation
|
Interesting. I'll forward this to the main maintainer/creator. Perhaps a stupid question : couldn't you just use try/finally to unload the model automatically ? |
|
Would be a nice addition but due to the use limited to one model for now it would be preferable to still give users the alternative to choose Web UI if they have the computing capabilities instead of a complete replacement. |
We're experimenting with using the ollama cmd commands to unload the models so it could work. It's just buggy for now.
Fair enough, We are looking into adding an option to load other models too. Ollama has native Nvidia/AMD GPU support as well as CPU as a fallback if needed. Ollama would be a better option overall as with the right optimisations it's a lot less overhead for lower end systems. Ollama does have the capability of loading models from huggingface too. |
|
New improvements have been made: Added ability to load custom Ollama models in the settings and save it to the config.json file. |
|
Current planned addition ideas being looked into: Add user's name from MAS in user prompt or add a setting to chose character name in settings as an alternative option. |
This pr is a proof of concept that replaces text-generation-ui, sillytavern and playwright components with an ollama backend.
This code is fully functional with the submod however it has the following limitations:
Hardcoded model chosen "llama3.2" due to it's small size.
Unable to unload model on exit.
Minor warnings about text context size that is not really an issue.
To test this code download and install Ollama here: https://github.com/ollama/ollama/releases/latest
To download llama 3.2, open a command prompt window and type:
ollama pull llama3.2Once it is downloaded just execute run.bat as normal. Ollama will load the model once you type your first message into MAS.
After exiting MAS, to unload the model, open a command prompt window and type:
ollama stop llama3.2We believe this will reduce the amount of overhead when running the submod.