A TypeScript implementation of the 1min.ai API relay service, designed to run on Cloudflare Workers with distributed rate limiting and accurate token counting.
- Complete API Relay: Full compatibility with 1min.ai chat completions, responses, image generation, and audio transcription/translation endpoints
- OpenAI Responses API: Structured outputs with JSON objects, JSON schema, and reasoning effort control
- Distributed Rate Limiting: Uses Cloudflare KV for consistent rate limiting across multiple worker instances
- Accurate Token Counting: Integrated with
gpt-tokenizerfor precise token calculation across all models - Dynamic Model List: Model data fetched live from the 1min.ai API with two-tier caching (in-memory + KV), always up to date
- Streaming Support: Real-time streaming responses for chat completions
- TypeScript: Full type safety and modern development experience
- Vision Support: Supports image input for vision models
- Audio Transcription & Translation: OpenAI Whisper-compatible speech-to-text and audio translation endpoints
Model data is fetched dynamically from the 1min.ai API and cached with a two-tier strategy (in-memory 5 min, KV 1 hr). The GET /v1/models endpoint always returns the latest available models. Use it to see the full list:
curl https://your-worker.your-subdomain.workers.dev/v1/modelsCapabilities such as vision, code interpreter, and web search are derived automatically from the API response — no hardcoded model lists.
POST /v1/chat/completions
POST /v1/responses
curl -X POST http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What do you see in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ..."
}
}
]
}
]
}'The Responses API supports structured outputs and reasoning control. It accepts two input formats:
Simple Input Format:
curl -X POST http://localhost:8787/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4.1",
"input": "Tell me a three sentence bedtime story about a unicorn.",
"reasoning_effort": "medium"
}'Messages Format (for conversations):
curl -X POST http://localhost:8787/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4.1",
"messages": [
{
"role": "user",
"content": "Analyze the pros and cons of remote work"
}
],
"reasoning_effort": "high"
}'JSON Object Response:
curl -X POST http://localhost:8787/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4.1",
"input": "Analyze the benefits of exercise",
"response_format": {
"type": "json_object"
},
"reasoning_effort": "high"
}'JSON Schema Response:
curl -X POST http://localhost:8787/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4.1",
"input": "Create a user profile for John Doe, age 30, software engineer",
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "user_profile",
"description": "A user profile object",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"profession": {"type": "string"},
"skills": {"type": "array", "items": {"type": "string"}},
"experience_years": {"type": "number"}
},
"required": ["name", "age", "profession"]
}
}
}
}'Responses API Features:
- Structured Outputs: JSON objects and JSON schema validation
- Reasoning Effort: Control reasoning depth (low, medium, high)
- Vision Support: Same image input capabilities as Chat Completions
- Streaming Support: Full OpenAI-compatible SSE streaming with
response.completedterminal event - Enhanced Prompting: Automatically optimizes prompts for structured responses
POST /v1/images/generations
POST /v1/audio/transcriptions
Transcribe audio to text using Whisper or Google Speech models. Accepts multipart/form-data.
curl -X POST http://localhost:8787/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@audio.mp3" \
-F "model=whisper-1"Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
file |
File | Yes | Audio file (mp3, mp4, m4a, wav, webm, ogg, flac). Max 25MB. |
model |
string | Yes | Model ID (e.g., whisper-1, latest_long, latest_short) |
language |
string | No | Language hint (ISO-639-1 for Whisper, BCP-47 for Google Speech) |
prompt |
string | No | Prompt to guide transcription style |
response_format |
string | No | json (default), text, verbose_json, srt, vtt |
temperature |
number | No | 0–1 sampling temperature |
OpenAI SDK:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8787/v1", api_key="YOUR_API_KEY")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=open("audio.mp3", "rb"),
)
print(transcript.text)POST /v1/audio/translations
Translate audio to English text. Same parameters as transcription (except language).
curl -X POST http://localhost:8787/v1/audio/translations \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@foreign-audio.mp3" \
-F "model=whisper-1"GET /v1/models
GET /
Returns information about all available endpoints:
- Chat Completions:
/v1/chat/completions - Responses:
/v1/responses - Image Generation:
/v1/images/generations - Audio Transcription:
/v1/audio/transcriptions - Audio Translation:
/v1/audio/translations - Models:
/v1/models
The worker implements distributed rate limiting with the following limits:
- Requests per minute: 180 per IP address
- Tokens per minute: 100,000 per IP address
Rate limits are enforced using Cloudflare KV storage, ensuring consistency across all worker instances.
- Node.js 18+
- Wrangler CLI
- Cloudflare account with Workers and KV enabled
- Clone the repository:
git clone https://github.com/7a6163/1min-relay-worker.git
cd 1min-relay-worker- Install dependencies:
npm install- Configure environment variables in
wrangler.jsonc:
- Create KV namespaces:
wrangler kv:namespace create "RATE_LIMIT_STORE"
wrangler kv:namespace create "MODEL_CACHE"- After running the commands above, you'll receive a KV namespace ID for each. Copy the IDs and update
wrangler.jsonc:
"kv_namespaces": [
{
"binding": "RATE_LIMIT_STORE",
"id": "your-rate-limit-kv-id-here"
},
{
"binding": "MODEL_CACHE",
"id": "your-model-cache-kv-id-here"
}
]Start the development server:
npm run devThe fastest way to deploy is using the Cloudflare Deploy button at the top of this README:
-
Make sure you've completed all the setup steps above, including creating and configuring the KV namespace.
-
Build the project:
npm run build- Deploy to Cloudflare Workers:
npm run deploy- After successful deployment, you'll receive a URL for your worker (typically
https://1min-relay.your-subdomain.workers.dev).
To use a custom domain with your worker:
-
Log in to the Cloudflare dashboard.
-
Navigate to the Workers & Pages section.
-
Select your deployed worker.
-
Click on "Triggers" tab.
-
Under "Custom Domains", click "Add Custom Domain" and follow the instructions.
If you encounter issues during deployment:
-
Authentication errors: Run
wrangler loginto authenticate with your Cloudflare account. -
KV binding errors: Ensure your KV namespace is correctly configured in
wrangler.jsonc. -
Build errors: Make sure all dependencies are installed with
npm install. -
Rate limit errors: If you're hitting Cloudflare's deployment rate limits, wait a few minutes before trying again.
-
Environment variable issues: Verify all required environment variables are set in
wrangler.jsonc.
The following environment variables are configured in wrangler.jsonc:
ONE_MIN_CHAT_API_URL: 1min.ai unified chat endpoint (/api/chat-with-ai)ONE_MIN_API_URL: 1min.ai features endpoint for non-chat features like image generation (/api/features)ONE_MIN_ASSET_URL: 1min.ai asset upload endpointONE_MIN_MODELS_API_URL: 1min.ai models API endpoint (for dynamic model list)
RATE_LIMIT_STORE: Used for distributed rate limiting storageMODEL_CACHE: Used for caching model data fetched from the 1min.ai API (1 hour TTL)
curl -X POST https://your-worker.your-subdomain.workers.dev/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'curl -X POST https://your-worker.your-subdomain.workers.dev/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4.1",
"input": "Analyze the benefits of renewable energy",
"response_format": {
"type": "json_object"
},
"reasoning_effort": "high"
}'curl -X POST https://your-worker.your-subdomain.workers.dev/v1/images/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "dall-e-3",
"prompt": "A beautiful sunset over mountains",
"n": 1,
"size": "1024x1024"
}'curl -X POST https://your-worker.your-subdomain.workers.dev/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@recording.mp3" \
-F "model=whisper-1" \
-F "response_format=text"curl -X POST https://your-worker.your-subdomain.workers.dev/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'The worker is built with:
- TypeScript: For type safety and better development experience
- Cloudflare Workers: Serverless edge computing platform
- Cloudflare KV: Distributed key-value storage for rate limiting and model data caching
- gpt-tokenizer: Accurate token counting for all supported models
The distributed rate limiting system:
- Uses IP address + endpoint as the key
- Tracks both request count and token count per minute
- Stores data in Cloudflare KV with TTL
- Returns proper HTTP 429 responses with rate limit headers
- Ensures consistency across all worker instances globally
Accurate token counting is implemented using the gpt-tokenizer library, which provides good approximations for all supported models including GPT, Claude, Mistral, and others. A character-based fallback is used if tokenization fails.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details
For issues and questions:
- Create an issue in the repository
- Check the Cloudflare Workers documentation
- Review the 1min.ai API documentation