Skip to content

feat(core-ai): add safe local runtime preloader#421

Merged
ucguy4u merged 1 commit into
mainfrom
codex/issue-391-local-preloader
Jun 28, 2026
Merged

feat(core-ai): add safe local runtime preloader#421
ucguy4u merged 1 commit into
mainfrom
codex/issue-391-local-preloader

Conversation

@ucguy4u

@ucguy4u ucguy4u commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR introduces changes to the local runtime orchestration layer.
Goal: centralize safe local model warmup so chat startup can reduce first-use latency without background evictions or duplicated runtime-specific startup logic.

Core outcome:

  • adds a residency policy and manager for safe local runtime loading decisions
  • adds a global preloader with abort, generation gating, and no-eviction background warmup
  • wires chat startup through the shared preloader instead of separate Gemini Nano and LiteRT warmups

Changes

Code

  • Added:
    • packages/core_ai/lib/src/residency/model_residency_manager.dart
    • packages/core_ai/lib/src/preload/model_preloader.dart
    • app/lib/core/services/local_runtime_preloader_service.dart
    • focused residency, preloader, service, and widget tests
  • Updated:
    • packages/core_ai/lib/core_ai.dart exports for the new shared runtime orchestration primitives
    • app/lib/features/agent_chat/presentation/screens/chat_screen.dart to trigger shared preload asynchronously and abort when the screen is disposed or generation starts

Logic

  • background preload now checks canLoadWithoutEviction before warming a runtime
  • active/user-triggered runtime work can use makeRoomFor, ensureResident, and runExclusive through the shared manager
  • image-capable packages are explicitly skipped during background preload
  • STT/TTS hooks exist as no-op adapters so future local runtimes can plug into the same flow

API Changes (if any)

  • None externally.

Database Changes (if any)

  • None.

Observability / Logging

  • Added preload completion logging in chat startup with per-runtime result reasons.

Performance Impact

  • Latency: expected improvement for repeat chat initialization and first local response after preload
  • Throughput: no meaningful change expected
  • Memory/CPU: background warmup now uses a no-eviction gate and shared serialization to reduce unsafe concurrent load pressure

Risks

  • runtime memory estimates are conservative and may skip some optional warmups on constrained devices
  • STT/TTS remain extension hooks only until concrete local assistant runtimes are added
  • Rollback plan:
    • revert this PR to restore the prior direct Gemini Nano and LiteRT warmup behavior

Testing

  • Unit tests:
    • cd packages/core_ai && flutter analyze lib/core_ai.dart lib/src/residency/model_residency_manager.dart lib/src/preload/model_preloader.dart test/residency/model_residency_manager_test.dart test/preload/model_preloader_test.dart
    • cd packages/core_ai && flutter test test/residency/model_residency_manager_test.dart test/preload/model_preloader_test.dart
    • cd app && flutter analyze lib/core/services/local_runtime_preloader_service.dart lib/features/agent_chat/presentation/screens/chat_screen.dart test/core/services/local_runtime_preloader_service_test.dart test/features/agent_chat/presentation/screens/chat_screen_preloader_test.dart
    • cd app && flutter test --no-pub test/core/services/local_runtime_preloader_service_test.dart test/features/agent_chat/presentation/screens/chat_screen_preloader_test.dart
  • Integration tests:
    • none
  • Manual testing:

Deployment Notes

  • Config changes:
    • none
  • Order of deployment:
    • normal app deployment

Related Commits

  • feat(core-ai): add safe local runtime preloader

Notes

- add residency policy and manager primitives for safe local runtime loading
- add a global preloader with abort, generation gating, and no-eviction background warmup
- wire chat startup through the shared preloader instead of ad hoc Gemini/LiteRT warmups
- add focused policy, preloader, service, and widget coverage

This centralizes local runtime warmup so first-use latency can improve without surprise evictions or duplicated startup logic.
@github-actions

Copy link
Copy Markdown

Plugin Module Size Gate

Policy: modules over 3 MB must be delivered as plugins; plugin modules over 5 MB must document cache management.

Module Size Type Status
packages/core_ai 0.34 MB bundled OK

@ucguy4u ucguy4u merged commit fa833f0 into main Jun 28, 2026
7 of 12 checks passed
@sonarqubecloud

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown

🚀 PR Quick Check Summary

Check Status Description
PR Validation ❌ failure Title format, docs, bundled model guardrail
Code Quality ❌ failure Analyze, formatting
Core Tests ✅ success Core package unit tests

💡 Note: Full app tests, coverage reports, and security scans run on merge to main.

View Details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AGENT] Add safe global local LLM preloader and residency manager

1 participant