Skip to content

feat: MetalRT VLM backend + bolder screen overlay#26

Open
AmanSwar wants to merge 3 commits intomainfrom
vlm_metalrt_integration
Open

feat: MetalRT VLM backend + bolder screen overlay#26
AmanSwar wants to merge 3 commits intomainfrom
vlm_metalrt_integration

Conversation

@AmanSwar
Copy link
Collaborator

Summary

  • MetalRT VLM backend: VLM commands (vlm, camera, screen) now use MetalRT's native vision pipeline when running on MetalRT engine. Falls back to llama.cpp gracefully if no MetalRT VLM model is found in HF cache.
  • Screen capture overlay: Bolder border (8px), larger corner handles (28px), wider edge grab zones (20px), added edge midpoint handles, double-layer glow, and heavier label font.

Changes

  • New MetalRTVlmEngine class (metalrt_vlm_engine.h/.cpp) wrapping MetalRT vision C API via dlsym
  • vlm_init_locked() tries MetalRT first, falls back to llama.cpp (no longer hard-rejects MetalRT backend)
  • All VLM functions (rcli_vlm_analyze, rcli_vlm_analyze_stream, rcli_vlm_get_stats, rcli_vlm_exit, handle_screen_intent) branch on vlm_use_metalrt flag
  • Updated error messages to be backend-agnostic
  • rcli_overlay.m visual improvements for easier drag/resize

Test plan

  • Run rcli vlm <image> "describe this" on MetalRT engine — should use MetalRT VLM
  • Run rcli vlm <image> on llama.cpp engine — should use llama.cpp VLM as before
  • Run rcli vlm on MetalRT without VLM model in HF cache — should fall back to llama.cpp
  • Run rcli screen — verify overlay is bolder with larger handles
  • Verify overlay drag and resize from corners and edges

AmanSwar and others added 3 commits March 16, 2026 01:08
When running on MetalRT engine, VLM commands (vlm, camera, screen) now
use MetalRT's native vision pipeline instead of requiring llama.cpp.
Falls back to llama.cpp gracefully if MetalRT VLM model not available.
Thicker border (8px), larger corner handles (28px), wider edge grab
zones (20px), added edge midpoint handles, double-layer outer glow,
and heavier label font for better visibility and usability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants