|
| 1 | +--- |
| 2 | +title: Configuring this pattern |
| 3 | +weight: 20 |
| 4 | +aliases: /rag-llm-cpu/configure/ |
| 5 | +--- |
| 6 | + |
| 7 | +# **Configuring this pattern** |
| 8 | + |
| 9 | +This guide covers common customizations, such as changing the default large language model (LLM), adding new models, and configuring retrieval-augmented generation (RAG) data sources. This guide assumes that you have already completed the [Getting started](/rag-llm-cpu/getting-started/) guide. |
| 10 | + |
| 11 | +## **Configuration overview** |
| 12 | + |
| 13 | +ArgoCD manages this pattern by using GitOps. All application configurations are defined in the `values-prod.yaml` file. To customize a component, complete the following steps: |
| 14 | + |
| 15 | +1. **Enable an override:** In the `values-prod.yaml` file, locate the application that you want to change, such as `llm-inference-service`, and add an `extraValueFiles:` entry that points to a new override file, such as `$patternref/overrides/llm-inference-service.yaml`. |
| 16 | +2. **Create the override file:** Create the new `.yaml` file in the `/overrides` directory. |
| 17 | +3. **Add settings:** Add the specific values that you want to change to the new file. |
| 18 | +4. **Commit and synchronize:** Commit your changes and allow ArgoCD to synchronize the application. |
| 19 | + |
| 20 | +## **Task: Changing the default LLM** |
| 21 | + |
| 22 | +By default, the pattern deploys the `mistral-7b-instruct-v0.2.Q5_0.gguf` model. You can change this to a different model, such as a different quantization, or adjust the resource usage. To change the default LLM, create an override file for the existing `llm-inference-service` application. |
| 23 | + |
| 24 | +1. **Enable the override:** |
| 25 | +In the `values-prod.yaml` file, update the `llm-inference-service` application to use an override file: |
| 26 | + ```yaml |
| 27 | + clusterGroup: |
| 28 | + # ... |
| 29 | + applications: |
| 30 | + # ... |
| 31 | + llm-inference-service: |
| 32 | + name: llm-inference-service |
| 33 | + namespace: rag-llm-cpu |
| 34 | + chart: llm-inference-service |
| 35 | + chartVersion: 0.3.* |
| 36 | + extraValueFiles: # <-- ADD THIS BLOCK |
| 37 | + - $patternref/overrides/llm-inference-service.yaml |
| 38 | + ``` |
| 39 | +
|
| 40 | +2. **Create the override file:** |
| 41 | +Create a new file named `overrides/llm-inference-service.yaml`. The following example switches to a different model file (Q8_0) and increases the CPU and memory requests: |
| 42 | + ```yaml |
| 43 | + inferenceService: |
| 44 | + resources: # <-- Increaed allocated resources |
| 45 | + requests: |
| 46 | + cpu: "8" |
| 47 | + memory: 12Gi |
| 48 | + limits: |
| 49 | + cpu: "12" |
| 50 | + memory: 24Gi |
| 51 | +
|
| 52 | + servingRuntime: |
| 53 | + args: |
| 54 | + - --model |
| 55 | + - /models/mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed model file |
| 56 | +
|
| 57 | + model: |
| 58 | + repository: TheBloke/Mistral-7B-Instruct-v0.2-GGUF |
| 59 | + files: |
| 60 | + - mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed file to download |
| 61 | + ``` |
| 62 | + |
| 63 | +## **Task: Adding a second LLM** |
| 64 | + |
| 65 | +You can deploy an additional LLM and add it to the demonstration user interface (UI). The following example deploys the HuggingFace TGI runtime instead of `llama.cpp`. This process requires two steps: deploying the new LLM and configuring the frontend UI. |
| 66 | + |
| 67 | +### **Step 1: Deploying the new LLM service** |
| 68 | + |
| 69 | +1. **Define the new application:** |
| 70 | +In the `values-prod.yaml` file, add a new application named `another-llm-inference-service` to the applications list. |
| 71 | + |
| 72 | + ```yaml |
| 73 | + clusterGroup: |
| 74 | + # ... |
| 75 | + applications: |
| 76 | + # ... |
| 77 | + another-llm-inference-service: # <-- ADD THIS NEW APPLICATION |
| 78 | + name: another-llm-inference-service |
| 79 | + namespace: rag-llm-cpu |
| 80 | + chart: llm-inference-service |
| 81 | + chartVersion: 0.3.* |
| 82 | + extraValueFiles: |
| 83 | + - $patternref/overrides/another-llm-inference-service.yaml |
| 84 | + ``` |
| 85 | + |
| 86 | +2. **Create the override file:** |
| 87 | +Create a new file named `overrides/another-llm-inference-service.yaml`. This file defines the new model and disables the creation of resources, such as secrets, that the first LLM already created. |
| 88 | + ```yaml |
| 89 | + dsc: |
| 90 | + initialize: false |
| 91 | + externalSecret: |
| 92 | + create: false |
| 93 | +
|
| 94 | + # Define the new InferenceService |
| 95 | + inferenceService: |
| 96 | + name: hf-inference-service # <-- New service name |
| 97 | + minReplicas: 1 |
| 98 | + maxReplicas: 1 |
| 99 | + resources: |
| 100 | + requests: |
| 101 | + cpu: "8" |
| 102 | + memory: 32Gi |
| 103 | + limits: |
| 104 | + cpu: "12" |
| 105 | + memory: 32Gi |
| 106 | +
|
| 107 | + # Define the new runtime (HuggingFace TGI) |
| 108 | + servingRuntime: |
| 109 | + name: hf-runtime |
| 110 | + port: 8080 |
| 111 | + image: docker.io/kserve/huggingfaceserver:latest |
| 112 | + modelFormat: huggingface |
| 113 | + args: |
| 114 | + - --model_dir |
| 115 | + - /models |
| 116 | + - --model_name |
| 117 | + - /models/Mistral-7B-Instruct-v0.3 |
| 118 | + - --http_port |
| 119 | + - "8080" |
| 120 | +
|
| 121 | + # Define the new model to download |
| 122 | + model: |
| 123 | + repository: mistralai/Mistral-7B-Instruct-v0.3 |
| 124 | + files: |
| 125 | + - generation_config.json |
| 126 | + - config.json |
| 127 | + - model.safetensors.index.json |
| 128 | + - model-00001-of-00003.safetensors |
| 129 | + - model-00002-of-00003.safetensors |
| 130 | + - model-00003-of-00003.safetensors |
| 131 | + - tokenizer.model |
| 132 | + - tokenizer.json |
| 133 | + - tokenizer_config.json |
| 134 | + ``` |
| 135 | + |
| 136 | + > **IMPORTANT:** A known issue in the model-downloading container requires that you explicitly list all files that you want to download from the HuggingFace repository. Ensure that you list every file required for the model to run. |
| 137 | + |
| 138 | +### **Step 2: Adding the new LLM to the demonstration UI** |
| 139 | + |
| 140 | +Configure the frontend to recognize the new LLM. |
| 141 | + |
| 142 | +1. **Edit the frontend overrides**: |
| 143 | +Open the `overrides/rag-llm-frontend-values.yaml` file. |
| 144 | +2. **Update LLM_URLS:** |
| 145 | +Add the URL of the new service to the `LLM_URLS` environment variable. The URL uses the `http://<service-name>-predictor/v1` format or `http://<service-name>-predictor/openai/v1` for the HuggingFace runtime. |
| 146 | +In the `overrides/rag-llm-frontend-values.yaml` file: |
| 147 | + |
| 148 | + ```yaml |
| 149 | + env: |
| 150 | + # ... |
| 151 | + - name: LLM_URLS |
| 152 | + value: '["http://cpu-inference-service-predictor/v1","http://hf-inference-service-predictor/openai/v1"]' |
| 153 | + ``` |
| 154 | + |
| 155 | +## **Task: Customizing RAG data sources** |
| 156 | + |
| 157 | +By default, the pattern ingests data from the Validated Patterns documentation. You can change this to point to public Git repositories or web pages. |
| 158 | + |
| 159 | +1. **Edit the vector database overrides:** |
| 160 | +Open the `overrides/vector-db-values.yaml` file. |
| 161 | +2. **Update sources:** |
| 162 | +Modify the `repoSources` and `webSources` keys. You can add any publicly available Git repository or public web URL. The job also processes PDF files from `webSources`. |
| 163 | +In the `overrides/vector-db-values.yaml` file: |
| 164 | + |
| 165 | + ```yaml |
| 166 | + providers: |
| 167 | + qdrant: |
| 168 | + enabled: true |
| 169 | + mssql: |
| 170 | + enabled: true |
| 171 | +
|
| 172 | + vectorEmbedJob: |
| 173 | + repoSources: |
| 174 | + - repo: https://github.com/your-org/your-docs.git # <-- Your repo |
| 175 | + globs: |
| 176 | + - "**/*.md" |
| 177 | + webSources: |
| 178 | + - https://your-company.com/product-manual.pdf # <-- Your PDF |
| 179 | + chunking: |
| 180 | + size: 4096 |
| 181 | + ``` |
| 182 | + |
| 183 | +## **Task: Adding a new RAG database provider** |
| 184 | + |
| 185 | +By default, the pattern enables `qdrant` and `mssql`. You can also enable `redis`, `pgvector`, or `elastic`. This process requires three steps: adding secrets, enabling the database, and configuring the UI. |
| 186 | + |
| 187 | +### **Step 1: Updating the secrets file** |
| 188 | + |
| 189 | +1. If the new database requires credentials, add them to the main secrets file: |
| 190 | + |
| 191 | + ```sh |
| 192 | + vim ~/values-secret-rag-llm-cpu.yaml |
| 193 | + ``` |
| 194 | +2. Add the necessary credentials. For example: |
| 195 | + |
| 196 | + ```yaml |
| 197 | + secrets: |
| 198 | + # ... |
| 199 | + - name: pgvector |
| 200 | + fields: |
| 201 | + - name: user |
| 202 | + value: user # <-- Update the user |
| 203 | + - name: password |
| 204 | + value: password # <-- Update the password |
| 205 | + - name: db |
| 206 | + value: db # <-- Update the db |
| 207 | + ``` |
| 208 | + |
| 209 | +> **NOTE:** For information about the expected values, see the [`values-secret.yaml.template`](https://github.com/validatedpatterns-sandbox/rag-llm-cpu/blob/main/values-secret.yaml.template) file. |
| 210 | + |
| 211 | +### **Step 2: Enabling the provider in the vector database chart** |
| 212 | + |
| 213 | +Edit the `overrides/vector-db-values.yaml` file and set `enabled: true` for the providers that you want to add. |
| 214 | + |
| 215 | +In the `overrides/vector-db-values.yaml` file: |
| 216 | + |
| 217 | +```yaml |
| 218 | +providers: |
| 219 | + qdrant: |
| 220 | + enabled: true |
| 221 | + mssql: |
| 222 | + enabled: true |
| 223 | + pgvector: # <-- ADD THIS |
| 224 | + enabled: true |
| 225 | + elastic: # <-- OR THIS |
| 226 | + enabled: true |
| 227 | +``` |
| 228 | + |
| 229 | +### **Step 3: Adding the provider to the demonstration UI** |
| 230 | + |
| 231 | +Edit the `overrides/rag-llm-frontend-values.yaml` file to configure the UI: |
| 232 | + |
| 233 | +1. Add the secrets for the new provider to the `dbProvidersSecret.vault` list. |
| 234 | +2. Add the connection details for the new provider to the `dbProvidersSecret.providers` list. |
| 235 | + |
| 236 | +The following example shows the configuration for non-default RAG database providers: |
| 237 | + |
| 238 | +In the `overrides/rag-llm-frontend-values.yaml` file: |
| 239 | + |
| 240 | +```yaml |
| 241 | +dbProvidersSecret: |
| 242 | + vault: |
| 243 | + - key: mssql |
| 244 | + field: sapassword |
| 245 | + - key: pgvector # <-- Add this block |
| 246 | + field: user |
| 247 | + - key: pgvector |
| 248 | + field: password |
| 249 | + - key: pgvector |
| 250 | + field: db |
| 251 | + - key: elastic # <-- Add this block |
| 252 | + field: user |
| 253 | + - key: elastic |
| 254 | + field: password |
| 255 | + providers: |
| 256 | + - type: qdrant # <-- Example for Qdrant |
| 257 | + collection: docs |
| 258 | + url: http://qdrant-service:6333 |
| 259 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 260 | + - type: mssql # <-- Example for MSSQL |
| 261 | + table: docs |
| 262 | + connection_string: >- |
| 263 | + Driver={ODBC Driver 18 for SQL Server}; |
| 264 | + Server=mssql-service,1433; |
| 265 | + Database=embeddings; |
| 266 | + UID=sa; |
| 267 | + PWD={{ .mssql_sapassword }}; |
| 268 | + TrustServerCertificate=yes; |
| 269 | + Encrypt=no; |
| 270 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 271 | + - type: redis # <-- Example for Redis |
| 272 | + index: docs |
| 273 | + url: redis://redis-service:6379 |
| 274 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 275 | + - type: elastic # <-- Example for Elastic |
| 276 | + index: docs |
| 277 | + url: http://elastic-service:9200 |
| 278 | + user: "{{ .elastic_user }}" |
| 279 | + password: "{{ .elastic_password }}" |
| 280 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 281 | + - type: pgvector # <-- Example for PGVector |
| 282 | + collection: docs |
| 283 | + url: >- |
| 284 | + postgresql+psycopg://{{ .pgvector_user }}:{{ .pgvector_password }}@pgvector-service:5432/{{ .pgvector_db }} |
| 285 | + embedding_model: sentence-transformers/all-mpnet-base-v2 |
| 286 | +``` |
0 commit comments