Liquid4All
diff --git a/‎.github/workflows/run-notebooks.yaml‎
Lines changed: 58 additions & 0 deletions b/‎.github/workflows/run-notebooks.yaml‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 5 additions & 0 deletions b/‎.gitignore‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎notebooks/LFM2_Inference_with_Ollama.ipynb‎
Lines changed: 10 additions & 2 deletions b/‎notebooks/LFM2_Inference_with_Ollama.ipynb‎
Lines changed: 10 additions & 2 deletions
diff --git a/‎notebooks/LFM2_Inference_with_Transformers.ipynb‎
Lines changed: 71 additions & 4 deletions b/‎notebooks/LFM2_Inference_with_Transformers.ipynb‎
Lines changed: 71 additions & 4 deletions
diff --git a/‎notebooks/LFM2_Inference_with_llama_cpp.ipynb‎
Lines changed: 51 additions & 5 deletions b/‎notebooks/LFM2_Inference_with_llama_cpp.ipynb‎
Lines changed: 51 additions & 5 deletions
@@ -0,0 +1,58 @@
+name: Run notebooks
+
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - 'notebooks/**'
+  pull_request:
+    branches:
+      - main
+    paths:
+      - 'notebooks/**'
+  workflow_dispatch:
+
+jobs:
+  discover-notebooks:
+    runs-on: ubuntu-latest
+    outputs:
+      notebooks: ${{ steps.list.outputs.notebooks }}
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: List notebooks
+        id: list
+        run: |
+          notebooks=$(ls notebooks/*.ipynb | xargs -I{} basename {} | jq -R -s -c 'split("\n") | map(select(. != ""))')
+          echo "notebooks=$notebooks" >> "$GITHUB_OUTPUT"
+          echo "Found notebooks: $notebooks"
+
+  run-notebooks:
+    needs: discover-notebooks
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        notebook: ${{ fromJSON(needs.discover-notebooks.outputs.notebooks) }}
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v4
+
+      - name: Install modal
+        run: uv pip install --system modal
+
+      - name: Set up Modal token
+        run: modal token set --token-id ${{ secrets.MODAL_TOKEN_ID }} --token-secret ${{ secrets.MODAL_TOKEN_SECRET }}
+
+      - name: Run notebook on Modal
+        run: python util/run_notebook_test.py --notebook "notebooks/${{ matrix.notebook }}" --skip-packages flash-attn
@@ -160,3 +160,8 @@ Thumbs.db
 *~
 .vscode/
 .onnx-tests/
+
+env
+env/*
+
+__pycache__/
@@ -3,7 +3,13 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "# 💧 LFM2 Inference with Ollama\n\nThis notebook demonstrates how to use the [Ollama](https://ollama.com) API to run [LFM2](https://huggingface.co/collections/LiquidAI/lfm2-67d775f3b4b6fe79fbb21bda) and [LFM2.5](https://huggingface.co/collections/LiquidAI/lfm25-6839e3e26b2a9fdbde95b341) models.\n\n> ⚠️ **Note:** Ollama is intended to run locally on your machine. This notebook shows the Python and curl API usage to get Ollama running in Colab. Install Ollama from [ollama.com/download](https://ollama.com/download) and follow the [Liquid Docs](https://docs.liquid.ai/docs/inference/ollama) to get started. Also, right now LFM VL models are currently not working with ollama, we have an [open PR](https://github.com/ollama/ollama/pull/14069) to resolve this quickly."
+   "source": [
+    "# 💧 LFM2 Inference with Ollama\n",
+    "\n",
+    "This notebook demonstrates how to use the [Ollama](https://ollama.com) API to run [LFM2](https://huggingface.co/collections/LiquidAI/lfm2-67d775f3b4b6fe79fbb21bda) and [LFM2.5](https://huggingface.co/collections/LiquidAI/lfm25-6839e3e26b2a9fdbde95b341) models.\n",
+    "\n",
+    "> ⚠️ **Note:** Ollama is intended to run locally on your machine. This notebook shows the Python and curl API usage to get Ollama running in Colab. Install Ollama from [ollama.com/download](https://ollama.com/download) and follow the [Liquid Docs](https://docs.liquid.ai/docs/inference/ollama) to get started. Also, right now LFM VL models are currently not working with ollama, we have an [open PR](https://github.com/ollama/ollama/pull/14069) to resolve this quickly."
+   ]
   },
   {
    "cell_type": "markdown",
@@ -19,6 +25,7 @@
    "outputs": [],
    "source": [
     "# Colab specific settings\n",
+    "# !modal_skip\n",
     "!sudo apt install zstd\n",
     "!sudo apt update\n",
     "!sudo apt install -y pciutils"
@@ -170,6 +177,7 @@
    "outputs": [],
    "source": [
     "# Chat API\n",
+    "# !modal_skip_rest\n",
     "%%bash\n",
     "curl -s http://localhost:11434/api/chat -d '{\n",
     "  \"model\": \"hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF\",\n",
@@ -219,4 +227,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
@@ -26,7 +26,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!uv pip install \"transformers>=5.0.0\" \"torch==2.9.0\" accelerate"
+    "!uv pip install \"transformers>=5.0.0\" \"torch==2.9.0\" accelerate, torchvision"
    ]
   },
   {
@@ -86,7 +86,30 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "from transformers import GenerationConfig\n\ngeneration_config = GenerationConfig(\n    do_sample=True,\n    temperature=0.1,\n    top_k=50,\n    repetition_penalty=1.05,\n    max_new_tokens=512,\n)\n\nprompt = \"Explain quantum computing in simple terms.\"\ninputs = tokenizer.apply_chat_template(\n    [{\"role\": \"user\", \"content\": prompt}],\n    add_generation_prompt=True,\n    return_tensors=\"pt\",\n    return_dict=True,\n).to(model.device)\n\noutput = model.generate(**inputs, generation_config=generation_config)\ninput_length = inputs[\"input_ids\"].shape[1]\nresponse = tokenizer.decode(output[0][input_length:], skip_special_tokens=True)\nprint(response)"
+   "source": [
+    "from transformers import GenerationConfig\n",
+    "\n",
+    "generation_config = GenerationConfig(\n",
+    "    do_sample=True,\n",
+    "    temperature=0.1,\n",
+    "    top_k=50,\n",
+    "    repetition_penalty=1.05,\n",
+    "    max_new_tokens=512,\n",
+    ")\n",
+    "\n",
+    "prompt = \"Explain quantum computing in simple terms.\"\n",
+    "inputs = tokenizer.apply_chat_template(\n",
+    "    [{\"role\": \"user\", \"content\": prompt}],\n",
+    "    add_generation_prompt=True,\n",
+    "    return_tensors=\"pt\",\n",
+    "    return_dict=True,\n",
+    ").to(model.device)\n",
+    "\n",
+    "output = model.generate(**inputs, generation_config=generation_config)\n",
+    "input_length = inputs[\"input_ids\"].shape[1]\n",
+    "response = tokenizer.decode(output[0][input_length:], skip_special_tokens=True)\n",
+    "print(response)"
+   ]
   },
   {
    "cell_type": "markdown",
@@ -131,7 +154,51 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "from transformers import AutoProcessor, AutoModelForImageTextToText\nfrom transformers.image_utils import load_image\n\n# Load vision model and processor\nmodel_id = \"LiquidAI/LFM2.5-VL-1.6B\"\nvision_model = AutoModelForImageTextToText.from_pretrained(\n    model_id,\n    device_map=\"auto\",\n    dtype=\"bfloat16\"\n)\n\n# IMPORTANT: tie lm_head to input embeddings (transformers v5 bug)\nvision_model.lm_head.weight = vision_model.get_input_embeddings().weight\n\nprocessor = AutoProcessor.from_pretrained(model_id)\n\n# Load image\nurl = \"https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg\"\nimage = load_image(url)\n\n# Create conversation\nconversation = [\n    {\n        \"role\": \"user\",\n        \"content\": [\n            {\"type\": \"image\", \"image\": image},\n            {\"type\": \"text\", \"text\": \"What is in this image?\"},\n        ],\n    },\n]\n\n# Generate response\ninputs = processor.apply_chat_template(\n    conversation,\n    add_generation_prompt=True,\n    return_tensors=\"pt\",\n    return_dict=True,\n    tokenize=True,\n).to(vision_model.device)\n\noutputs = vision_model.generate(**inputs, do_sample=True, temperature=0.1, min_p=0.15, repetition_penalty=1.05, max_new_tokens=64)\nresponse = processor.batch_decode(outputs, skip_special_tokens=True)[0]\nprint(response)"
+   "source": [
+    "from transformers import AutoProcessor, AutoModelForImageTextToText\n",
+    "from transformers.image_utils import load_image\n",
+    "\n",
+    "# Load vision model and processor\n",
+    "model_id = \"LiquidAI/LFM2.5-VL-1.6B\"\n",
+    "vision_model = AutoModelForImageTextToText.from_pretrained(\n",
+    "    model_id,\n",
+    "    device_map=\"auto\",\n",
+    "    dtype=\"bfloat16\"\n",
+    ")\n",
+    "\n",
+    "# IMPORTANT: tie lm_head to input embeddings (transformers v5 bug)\n",
+    "vision_model.lm_head.weight = vision_model.get_input_embeddings().weight\n",
+    "\n",
+    "processor = AutoProcessor.from_pretrained(model_id)\n",
+    "\n",
+    "# Load image\n",
+    "url = \"https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg\"\n",
+    "image = load_image(url)\n",
+    "\n",
+    "# Create conversation\n",
+    "conversation = [\n",
+    "    {\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": [\n",
+    "            {\"type\": \"image\", \"image\": image},\n",
+    "            {\"type\": \"text\", \"text\": \"What is in this image?\"},\n",
+    "        ],\n",
+    "    },\n",
+    "]\n",
+    "\n",
+    "# Generate response\n",
+    "inputs = processor.apply_chat_template(\n",
+    "    conversation,\n",
+    "    add_generation_prompt=True,\n",
+    "    return_tensors=\"pt\",\n",
+    "    return_dict=True,\n",
+    "    tokenize=True,\n",
+    ").to(vision_model.device)\n",
+    "\n",
+    "outputs = vision_model.generate(**inputs, do_sample=True, temperature=0.1, min_p=0.15, repetition_penalty=1.05, max_new_tokens=64)\n",
+    "response = processor.batch_decode(outputs, skip_special_tokens=True)[0]\n",
+    "print(response)"
+   ]
   },
   {
    "cell_type": "markdown",
@@ -161,4 +228,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
@@ -44,7 +44,14 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "!llama-b7633/llama-cli \\\n    -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF:Q4_K_M \\\n    -p \"What is C. elegans?\" \\\n    -n 256 \\\n    --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05"
+   "source": [
+    "# !modal_skip\n",
+    "!llama-b7633/llama-cli \\\n",
+    "    -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF:Q4_K_M \\\n",
+    "    -p \"What is C. elegans?\" \\\n",
+    "    -n 256 \\\n",
+    "    --temp 0.1 --top-k 50 --top-p 0.1 --repeat-penalty 1.05"
+   ]
   },
   {
    "cell_type": "markdown",
@@ -99,15 +106,34 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!uv pip install -qqq openai"
+    "!uv pip install -qqq openai requests"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "from openai import OpenAI\n\nclient = OpenAI(\n    base_url=\"http://localhost:8000/v1\",\n    api_key=\"not-needed\"\n)\n\nresponse = client.chat.completions.create(\n    model=\"lfm2.5-1.2b-instruct\",\n    messages=[\n        {\"role\": \"user\", \"content\": \"What is machine learning?\"}\n    ],\n    temperature=0.1,\n    top_p=0.1,\n    max_tokens=512,\n    extra_body={\"top_k\": 50, \"repetition_penalty\": 1.05},\n)\nprint(response.choices[0].message.content)"
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "client = OpenAI(\n",
+    "    base_url=\"http://localhost:8000/v1\",\n",
+    "    api_key=\"not-needed\"\n",
+    ")\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=\"lfm2.5-1.2b-instruct\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"user\", \"content\": \"What is machine learning?\"}\n",
+    "    ],\n",
+    "    temperature=0.1,\n",
+    "    top_p=0.1,\n",
+    "    max_tokens=512,\n",
+    "    extra_body={\"top_k\": 50, \"repetition_penalty\": 1.05},\n",
+    ")\n",
+    "print(response.choices[0].message.content)"
+   ]
   },
   {
    "cell_type": "code",
@@ -148,7 +174,7 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "!llama-b7633/llama-cli \\\n    -hf LiquidAI/LFM2.5-VL-1.6B-GGUF:Q4_0 \\\n    --image test_image.jpg \\\n    --image-max-tokens 64 \\\n    -p \"What's in this image?\" \\\n    -n 128 \\\n    --temp 0.1 --min-p 0.15 --repeat-penalty 1.05"
+   "source": "# !modal_skip\n!llama-b7633/llama-cli \\\n    -hf LiquidAI/LFM2.5-VL-1.6B-GGUF:Q4_0 \\\n    --image test_image.jpg \\\n    --image-max-tokens 64 \\\n    -p \"What's in this image?\" \\\n    -n 128 \\\n    --temp 0.1 --min-p 0.15 --repeat-penalty 1.05"
   },
   {
    "cell_type": "markdown",
@@ -202,7 +228,27 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "client = OpenAI(\n    base_url=\"http://localhost:8000/v1\",\n    api_key=\"not-needed\"\n)\n\nresponse = client.chat.completions.create(\n    model=\"lfm2.5-vl-1.6b\",\n    messages=[{\n        \"role\": \"user\",\n        \"content\": [\n            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n            {\"type\": \"text\", \"text\": \"What's in this image?\"}\n        ]\n    }],\n    temperature=0.1,\n    max_tokens=512,\n    extra_body={\"min_p\": 0.15, \"repetition_penalty\": 1.05},\n)\nprint(response.choices[0].message.content)"
+   "source": [
+    "client = OpenAI(\n",
+    "    base_url=\"http://localhost:8000/v1\",\n",
+    "    api_key=\"not-needed\"\n",
+    ")\n",
+    "\n",
+    "response = client.chat.completions.create(\n",
+    "    model=\"lfm2.5-vl-1.6b\",\n",
+    "    messages=[{\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": [\n",
+    "            {\"type\": \"image_url\", \"image_url\": {\"url\": image_url}},\n",
+    "            {\"type\": \"text\", \"text\": \"What's in this image?\"}\n",
+    "        ]\n",
+    "    }],\n",
+    "    temperature=0.1,\n",
+    "    max_tokens=512,\n",
+    "    extra_body={\"min_p\": 0.15, \"repetition_penalty\": 1.05},\n",
+    ")\n",
+    "print(response.choices[0].message.content)"
+   ]
   },
   {
    "cell_type": "code",