Skip to content

Voice command upgrade: answer_question() with camera + memory context #6

@chatde

Description

@chatde

Parent PRD

#1

What to build

Upgrade VoiceHandler to route voice queries through Gemma4Brain.answer_question() instead of the old Llama 3.1 text model. Voice queries now get full memory context + optional camera frame, so Vector gives richer, personalized answers.

Acceptance criteria

  • Asking Vector 'What do you see?' uses Gemma4Brain.describe_view(pil_image) — not LLaVA
  • Asking 'What have you seen today?' references MemoryBank recent observations
  • Asking 'What is that?' with a pointing gesture analyzes the current camera frame
  • Response is spoken via say_text() with TTS chunking
  • Voice queries work when GEMMA4=1 env var is set
  • If camera is unavailable, answer_question() still works text-only

Blocked by

User stories addressed

  • Voice commands get intelligent, memory-aware answers from Gemma 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions