Summary
Add an optional caching layer to avoid repeated identical API calls, reducing latency and API costs for cacheable requests.
Use Cases
- Identical prompts - Same question asked multiple times
- System prompt reuse - Caching responses with identical system prompts
- Development/testing - Avoid hitting API during debugging
- Rate limiting - Reduce API calls when approaching limits
Proposed API
// Enable caching via options
services.AddCompactifAI(options =>
{
options.ApiKey = "...";
options.EnableCaching = true;
options.CacheDuration = TimeSpan.FromMinutes(5);
});
// Or per-request control
var response = await client.ChatAsync(
"What is 2+2?",
cacheOptions: new CacheOptions
{
Enabled = true,
Duration = TimeSpan.FromHours(1)
});
Implementation Options
Option 1: IMemoryCache Integration
public class CachingCompactifAIClient : ICompactifAIClient
{
private readonly IMemoryCache _cache;
private readonly ICompactifAIClient _inner;
public async Task<ChatCompletionResponse> CreateChatCompletionAsync(...)
{
var cacheKey = GenerateCacheKey(request);
if (_cache.TryGetValue(cacheKey, out ChatCompletionResponse cached))
return cached;
var response = await _inner.CreateChatCompletionAsync(request);
_cache.Set(cacheKey, response, _cacheDuration);
return response;
}
}
Option 2: Decorator Pattern
Allow users to wrap the client with their own caching strategy.
Cache Key Generation
private static string GenerateCacheKey(ChatCompletionRequest request)
{
// Hash based on: model + messages + temperature + other deterministic params
// Exclude: stream, user, etc.
}
Expected Benefits
- Reduced API costs - Avoid paying for duplicate requests
- Lower latency - Instant responses for cached results
- Offline development - Work with cached responses during development
Priority
🟢 P2 - Medium Impact
Considerations
- Cache invalidation strategy
- Memory limits for cache size
- Consider distributed cache support (IDistributedCache)
- Non-deterministic responses (temperature > 0) may not be suitable for caching
- Should be opt-in to avoid unexpected behavior
Summary
Add an optional caching layer to avoid repeated identical API calls, reducing latency and API costs for cacheable requests.
Use Cases
Proposed API
Implementation Options
Option 1: IMemoryCache Integration
Option 2: Decorator Pattern
Allow users to wrap the client with their own caching strategy.
Cache Key Generation
Expected Benefits
Priority
🟢 P2 - Medium Impact
Considerations