This guide covers the advanced AI features implemented in the Mobile AI Orchestrator, including reservoir computing, neural routing, and spiking neural networks.
- Reservoir Computing for Context Preservation
- Neural Network Router (MLP)
- Spiking Neural Networks for Wake Detection
- Integration Patterns
- Configuration and Tuning
- Performance Considerations
The Echo State Network (ESN) provides temporal context compression, solving the "Echomesh" problem by preserving conversation patterns across sessions with minimal memory overhead.
Input (384-dim) → Reservoir (1000 neurons) → Output (100-dim compressed state)
↓
Fixed random weights
Liquid state dynamics
Key Parameters:
input_size: 384 (matches embedding dimension)reservoir_size: 1000 (liquid state neurons)output_size: 100 (compressed representation)leak_rate: 0.7 (how quickly neurons forget)spectral_radius: 0.95 (stability of reservoir dynamics)
use mobile_ai_orchestrator::EchoStateNetwork;
// Create ESN
let mut esn = EchoStateNetwork::new(
384, // input dimension
1000, // reservoir size
100, // output dimension
0.7, // leak rate
0.95 // spectral radius
);
// Process text input
let text = "User asked about Rust lifetimes";
let encoding = reservoir::encode_text(text, 384); // Bag-of-words encoding
let state = esn.update(&encoding); // Returns 1000-dim liquid state
// Get compressed output for storage
let compressed = esn.output(); // Returns 100-dim vectorThe context manager automatically integrates with the reservoir:
use mobile_ai_orchestrator::Orchestrator;
let mut orch = Orchestrator::new();
// Reservoir is automatically updated on each conversation turn
orch.process(query1)?; // ESN state updated
orch.process(query2)?; // ESN evolves based on temporal dynamics
orch.process(query3)?; // Previous context influences current state
// Get context snapshot with reservoir state
let snapshot = orch.context_snapshot(10);
// Contains: recent history + 1000-dim reservoir stateFor production use, you should train the output weights:
// Collect training data
let inputs: Vec<Vec<f32>> = vec![/* encoded conversation turns */];
let targets: Vec<Vec<f32>> = vec![/* desired outputs */];
// Train using ridge regression
esn.train(&inputs, &targets, 0.01)?; // lambda=0.01 for regularization
// Save trained model
let serialized = serde_json::to_string(&esn)?;
std::fs::write("reservoir_weights.json", serialized)?;
// Load trained model
let loaded: EchoStateNetwork = serde_json::from_str(&serialized)?;// End of session - save reservoir state
let context_snapshot = orch.context_snapshot(10);
let state_json = serde_json::to_string(&context_snapshot)?;
std::fs::write("session_state.json", state_json)?;
// Start of new session - restore state
let state_json = std::fs::read_to_string("session_state.json")?;
let snapshot: ContextSnapshot = serde_json::from_str(&state_json)?;
// Restore reservoir state in context manager
// (Currently requires manual reconstruction - future improvement)
let mut esn = EchoStateNetwork::new(384, 1000, 100, 0.7, 0.95);
// Set state from snapshot.reservoir_state if availableGood Use Cases:
- Long conversation histories (>50 turns)
- Cross-session continuity
- Pattern recognition in temporal sequences
- Memory-constrained environments
Not Ideal For:
- Short conversations (<10 turns)
- Stateless query answering
- When full conversation history is needed for exact retrieval
- Memory: 1000 floats = 4KB (vs. full history: 10-100KB per turn)
- Update Latency: ~100-200μs per turn
- Compression Ratio: 10x (1000 turns → 100-dim state)
- Accuracy: Depends on training; ~80-90% pattern recall typical
The Multi-Layer Perceptron provides learned routing decisions, replacing heuristic rules with trained patterns based on user feedback.
Query Features (384) → Hidden Layer 1 (100) → Hidden Layer 2 (50) → Output (3)
ReLU ReLU Softmax
↓
[P(Local), P(Remote), P(Hybrid)]
use mobile_ai_orchestrator::mlp::MLP;
// Create MLP
let mlp = MLP::new(
384, // input size (query embedding)
vec![100, 50], // hidden layer sizes
3 // output size (Local, Remote, Hybrid)
);
// Forward pass for routing decision
let query_embedding = encode_query_features(&query);
let logits = mlp.forward(&query_embedding);
let probabilities = MLP::softmax(&logits);
let decision = MLP::argmax(&probabilities);
match decision {
0 => println!("Route to Local (P={:.2})", probabilities[0]),
1 => println!("Route to Remote (P={:.2})", probabilities[1]),
2 => println!("Route to Hybrid (P={:.2})", probabilities[2]),
_ => unreachable!(),
}Future Integration Plan (not yet implemented):
// In router.rs, replace heuristic routing with MLP
pub struct Router {
mlp: Option<MLP>,
config: RouterConfig,
}
impl Router {
pub fn route(&self, query: &Query) -> (RoutingDecision, f32) {
if let Some(ref mlp) = self.mlp {
// Use MLP for routing
let features = self.extract_features(query);
let logits = mlp.forward(&features);
let probs = MLP::softmax(&logits);
let decision_idx = MLP::argmax(&probs);
let decision = match decision_idx {
0 => RoutingDecision::Local,
1 => RoutingDecision::Remote,
2 => RoutingDecision::Hybrid,
_ => RoutingDecision::Blocked,
};
(decision, probs[decision_idx])
} else {
// Fallback to heuristics
self.heuristic_route(query)
}
}
}// Collect user feedback data
let training_data = vec![
(query1_features, 0), // User confirmed: should be Local
(query2_features, 1), // User confirmed: should be Remote
(query3_features, 2), // User confirmed: should be Hybrid
];
// Convert to one-hot targets
let targets: Vec<Vec<f32>> = training_data
.iter()
.map(|(_, label)| {
let mut target = vec![0.0; 3];
target[*label] = 1.0;
target
})
.collect();
// Training loop
let learning_rate = 0.01;
let epochs = 100;
for epoch in 0..epochs {
let mut total_loss = 0.0;
for (input, target) in training_data.iter().zip(&targets) {
let (loss, gradients) = mlp.backward(input, target);
mlp.update(&gradients, learning_rate);
total_loss += loss;
}
if epoch % 10 == 0 {
println!("Epoch {}: Loss = {:.4}", epoch, total_loss / training_data.len() as f32);
}
}
// Save trained model
let model_json = serde_json::to_string(&mlp)?;
std::fs::write("mlp_router.json", model_json)?;fn extract_features(query: &Query) -> Vec<f32> {
let mut features = vec![0.0; 384];
// Feature 0-9: Length indicators
features[0] = (query.text.len() as f32 / 1000.0).min(1.0);
// Feature 10-19: Complexity indicators
features[10] = if query.text.contains('?') { 1.0 } else { 0.0 };
features[11] = query.text.split_whitespace().count() as f32 / 100.0;
// Feature 20-379: Text embedding (placeholder)
// In production, use sentence-transformers or similar
let embedding = encode_text(&query.text, 360);
features[20..380].copy_from_slice(&embedding);
// Feature 380-383: Metadata
features[380] = query.priority as f32 / 10.0;
features[381] = if query.project_context.is_some() { 1.0 } else { 0.0 };
features
}Advantages:
- Learns user preferences over time
- Adapts to specific use patterns
- Explainable via probability distribution
- Can capture complex decision boundaries
Requirements:
- Training data (100+ labeled examples minimum)
- User feedback mechanism
- Offline training infrastructure
- Model versioning and A/B testing
Current Status: Implemented but not integrated. Requires:
- Feature extraction pipeline
- User feedback collection
- Training data pipeline
- Model deployment system
Spiking Neural Networks provide event-driven, ultra-low-power processing for always-on features like wake word detection and context switching triggers.
Input Spikes (10) → Hidden Layer (20 LIF neurons) → Output Layer (3 neurons)
↓
Sparse connectivity (20%)
Event-driven computation
Leaky Integrate-and-Fire dynamics
use mobile_ai_orchestrator::SpikingNetwork;
// Create SNN
let mut snn = SpikingNetwork::new(
10, // input size
20, // hidden size
3 // output size
);
// Process spike inputs
let input_spikes = vec![
true, // Input 0 spiking
false, // Input 1 silent
true, // Input 2 spiking
// ... 7 more inputs
];
// Step simulation (dt = 1ms)
let output_spikes = snn.step(&input_spikes, 1.0);
// Count spikes over window for decision
let mut spike_counts = vec![0; 3];
for _ in 0..100 { // 100ms window
let spikes = snn.step(&get_current_input(), 1.0);
for (i, &spike) in spikes.iter().enumerate() {
if spike { spike_counts[i] += 1; }
}
}
// Make decision based on spike counts
let decision = spike_counts
.iter()
.enumerate()
.max_by_key(|(_, &count)| count)
.map(|(idx, _)| idx)
.unwrap();// Convert audio features to spikes
fn audio_to_spikes(audio_features: &[f32], threshold: f32) -> Vec<bool> {
audio_features.iter().map(|&x| x > threshold).collect()
}
// Detect wake word
let mut detector = SpikingNetwork::new(40, 100, 2); // 2 outputs: wake/no-wake
let mut wake_count = 0;
let mut no_wake_count = 0;
// Process 200ms of audio (200 time steps at 1ms)
for t in 0..200 {
let audio_frame = get_audio_frame(t);
let input_spikes = audio_to_spikes(&audio_frame, 0.5);
let output = detector.step(&input_spikes, 1.0);
if output[0] { wake_count += 1; } // Wake word neuron
if output[1] { no_wake_count += 1; } // Background neuron
}
if wake_count > 50 && wake_count > no_wake_count * 2 {
println!("Wake word detected!");
activate_full_system();
}Traditional Continuous Inference:
- Always running neural network
- Power: ~100-500mW
- Battery impact: High
Event-Driven SNN:
- Only computes on spike events
- Sparse connectivity (20% vs 100%)
- Power: ~0.1-5mW (100x-1000x reduction)
- Battery impact: Minimal
Example Calculation:
Traditional DNN:
- Power: 200mW
- 24h battery: 200mW * 24h = 4.8Wh
Event-Driven SNN:
- Power: 2mW (average with 10% spike rate)
- 24h battery: 2mW * 24h = 0.048Wh
Savings: 99% reduction
- Wake Word Detection: Always-on listening with minimal power
- Context Switching: Detect when user switches apps/tasks
- Gesture Recognition: Process accelerometer/gyroscope data
- Proactive Assistance: Trigger AI based on usage patterns
Current: CPU implementation (workable for prototyping)
Future Options:
- DSP: Many mobile SoCs have DSPs that can efficiently run SNNs
- NPU: Some NPUs support sparse/event-driven operations
- Neuromorphic Hardware: Intel Loihi, BrainChip Akida (specialized)
Example: Qualcomm Hexagon DSP:
// Compile SNN for Hexagon DSP
// (Requires Hexagon SDK and cross-compilation)
#[cfg(target_arch = "hexagon")]
fn run_on_dsp(snn: &SpikingNetwork, input: &[bool]) -> Vec<bool> {
// Use Hexagon vector extensions for parallel neuron updates
// Optimize for low-power operation
// Return output spikes
}Current Status: Random weights (initialization only)
Training Methods (to implement):
- Spike-Timing-Dependent Plasticity (STDP):
// Biological learning rule
// Strengthen synapses when pre-neuron spikes before post-neuron
fn stdp_update(weights: &mut [f32], pre_spike_time: u32, post_spike_time: u32) {
let delta_t = post_spike_time as i32 - pre_spike_time as i32;
let learning_rate = 0.01;
if delta_t > 0 {
// Pre before post: strengthen
*weight += learning_rate * (-delta_t as f32 / 20.0).exp();
} else {
// Post before pre: weaken
*weight -= learning_rate * (delta_t as f32 / 20.0).exp();
}
}- Backpropagation Through Time (BPTT):
// Convert spikes to differentiable surrogates
fn surrogate_gradient(voltage: f32, threshold: f32) -> f32 {
let beta = 10.0;
1.0 / (1.0 + (beta * (voltage - threshold)).abs()).powi(2)
}use mobile_ai_orchestrator::{Orchestrator, EchoStateNetwork, SpikingNetwork};
struct FullAIStack {
orchestrator: Orchestrator,
wake_detector: SpikingNetwork,
is_active: bool,
}
impl FullAIStack {
fn new() -> Self {
Self {
orchestrator: Orchestrator::new(),
wake_detector: SpikingNetwork::new(40, 100, 2),
is_active: false,
}
}
fn process_audio_frame(&mut self, audio: &[f32]) {
// Step 1: Wake detection (always running, low power)
let spikes = audio_to_spikes(audio, 0.5);
let wake_output = self.wake_detector.step(&spikes, 1.0);
if wake_output[0] { // Wake word detected
self.is_active = true;
self.activate_main_system();
}
}
fn process_query(&mut self, query_text: String) -> Result<String, String> {
if !self.is_active {
return Err("System not active".to_string());
}
// Step 2: Main orchestrator (powered on by wake)
let query = Query::new(&query_text);
let response = self.orchestrator.process(query)?;
// Step 3: Check for end of interaction
if is_goodbye(&response.text) {
self.is_active = false;
self.enter_low_power_mode();
}
Ok(response.text)
}
}// Start with Phase 1 only
let mut orch = Orchestrator::new();
// Add reservoir computing when ready (Phase 2)
// (Currently automatic in context manager)
// Add MLP routing when trained (Phase 3)
// let mlp = load_trained_mlp("router.json")?;
// orch.set_router_mlp(mlp);
// Add SNN wake detection when deployed (Phase 4)
// let snn = SpikingNetwork::new(40, 100, 2);
// orch.set_wake_detector(snn);// Android JNI example (pseudo-code)
#[no_mangle]
pub extern "C" fn Java_com_example_MobileAI_processQuery(
env: JNIEnv,
_class: JClass,
query_string: JString,
) -> jstring {
let query: String = env.get_string(query_string).unwrap().into();
let mut orch = ORCHESTRATOR.lock().unwrap();
let result = match orch.process(Query::new(&query)) {
Ok(response) => response.text,
Err(e) => format!("Error: {}", e),
};
env.new_string(result).unwrap().into_inner()
}// Default configuration (balanced)
let esn = EchoStateNetwork::new(384, 1000, 100, 0.7, 0.95);
// Fast forgetting (for short-term patterns)
let esn_fast = EchoStateNetwork::new(384, 1000, 100, 0.9, 0.8);
// Higher leak rate = faster forgetting
// Lower spectral radius = less history influence
// Slow forgetting (for long-term patterns)
let esn_slow = EchoStateNetwork::new(384, 1000, 100, 0.3, 0.99);
// Lower leak rate = slower forgetting
// Higher spectral radius = more history influence
// Memory-constrained (smaller reservoir)
let esn_small = EchoStateNetwork::new(384, 500, 50, 0.7, 0.95);
// 500 neurons = 2KB vs 4KB
// 50 output dims = smaller storage// Small model (fast inference, lower accuracy)
let mlp_small = MLP::new(384, vec![50], 3);
// ~19K parameters, ~20-30μs inference
// Medium model (balanced)
let mlp_medium = MLP::new(384, vec![100, 50], 3);
// ~44K parameters, ~50-70μs inference
// Large model (high accuracy, slower)
let mlp_large = MLP::new(384, vec![200, 100, 50], 3);
// ~107K parameters, ~100-150μs inference// Wake detector (sensitive, low false negative)
let wake_snn = SpikingNetwork::new(40, 200, 2);
// More hidden neurons = better detection
// Trade-off: slightly higher power
// Gesture recognition (balanced)
let gesture_snn = SpikingNetwork::new(20, 100, 5);
// 5 output classes (swipe left/right/up/down/tap)
// Ultra-low-power (minimize computation)
let minimal_snn = SpikingNetwork::new(10, 50, 2);
// Smallest viable network
// Use for binary decisions only# Run all benchmarks
cargo bench
# Specific benchmarks
cargo bench orchestrator
cargo bench reservoir
cargo bench mlp
# With specific features
cargo bench --features network| Operation | Latency | Throughput |
|---|---|---|
| Simple query | 5-10μs | 100k-200k QPS |
| ESN update | 100-200μs | 5k-10k updates/s |
| MLP forward (medium) | 50-100μs | 10k-20k inferences/s |
| SNN step (100 neurons) | 10-50μs | 20k-100k steps/s |
- Batch Processing: Process multiple queries together
fn process_batch(queries: Vec<Query>) -> Vec<Response> {
// Amortize overhead across queries
}- SIMD: Use platform-specific vector instructions
#[cfg(target_arch = "aarch64")]
use std::arch::aarch64::*; // NEON intrinsics for ARM- Lazy Evaluation: Only compute what's needed
// Don't compute reservoir output if not persisting
if should_save_state {
let output = esn.output();
save_to_disk(&output)?;
}- Memory Pool: Reuse allocations
struct QueryPool {
embeddings: Vec<Vec<f32>>,
index: usize,
}
impl QueryPool {
fn get_embedding_buffer(&mut self) -> &mut Vec<f32> {
self.index = (self.index + 1) % self.embeddings.len();
&mut self.embeddings[self.index]
}
}- Quantization: Use int8 instead of f32
// Future: Quantized MLP
struct QuantizedMLP {
weights: Vec<Vec<i8>>,
scale_factors: Vec<f32>,
}
// 4x memory reduction, 2-4x speedup on mobile- Model Pruning: Remove unnecessary connections
// Remove weights below threshold
fn prune_mlp(mlp: &mut MLP, threshold: f32) {
for layer in &mut mlp.weights {
for weights in layer {
for w in weights {
if w.abs() < threshold {
*w = 0.0;
}
}
}
}
}- Power Profiling: Use platform battery APIs
// Android example (pseudo-code)
fn estimate_power_consumption(operation: &str) {
let start_power = get_battery_power();
perform_operation();
let end_power = get_battery_power();
println!("{} consumed {}mW", operation, start_power - end_power);
}- Replace Text Encoding: Integrate sentence-transformers for better embeddings
- Train MLP: Collect user feedback data and train routing model
- Deploy SNN: Profile on DSP/NPU hardware
- Add SQLite: Persist conversation state and reservoir weights
- Benchmark on Device: Profile on actual mobile hardware (Snapdragon, Apple Silicon)
- Reservoir Computing Paper: Jaeger, "The 'echo state' approach to analysing and training recurrent neural networks" (2001)
- SNN Tutorial: Neftci et al., "Surrogate Gradient Learning in Spiking Neural Networks" (2019)
- Mobile ML: Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications" (2017)
For basic usage, see examples/ directory
For performance baselines, run cargo bench
For API documentation, run cargo doc --open