Banana Pro AI is a high-performance image generation platform designed for creative professionals. It integrates Gemini and OpenAI standard API capabilities, supporting high-resolution (up to 4K) text-to-image and image-to-image generation, available in both Desktop and Web formats.
Note
The application supports English internally and allows language switching in the settings.
🆕 v2.8.0 Updates:
- 🤖 Dedicated OpenAI Image Generation: New
openai-imageprovider type supporting/v1/images/generationsstandard API (gpt-image-2 model).- 🎨 Image Card Refactor: Smart thumbnail/full-size switching, improved drag-and-drop, better loading experience.
💡 Recommended: For the best generation experience and cost-effectiveness, we recommend using Yunwu API.
Resolution Yunwu API Price Google Official Price (Ref) 1K (1024x1024) ¥0.08 / Image ≈ ¥0.94 / Image 2K (2048x2048) ¥0.08 / Image ≈ ¥0.94 / Image 4K (4096x4096) ¥0.14 / Image ≈ ¥1.68 / Image
- 🚀 Extreme Performance: Built with Tauri 2.0 architecture and a high-concurrency Sidecar backend written in Go, ensuring extremely low resource usage.
- 🖼️ 4K Ultra-HD Creation: Deeply optimized Gemini 3.0 model, supporting 4K UHD generation across multiple aspect ratios.
- 🔌 Standard API Compatibility: Supports three provider types:
gemini(/v1beta),openai(/v1/chat/completions multimodal), andopenai-image(/v1/images/generations) with configurable Base URL and Model ID. - ⚡ Custom Protocol (asset://): Registered native resource protocol for desktop, bypassing the HTTP stack to increase local image loading speed by 300%.
- 💾 Smart History Management: Built-in local database and persistent caching, supporting task recovery and instant opening of large history records.
- 📸 Precise Image-to-Image: Supports multiple reference images with fine-grained style and composition control.
- 📦 Automated Delivery: Integrated GitHub Actions for automated packaging and releasing on macOS (Intel/M1) and Windows.
- 🧩 Template Market: Prioritizes pulling remote template JSON on startup with automatic fallback to built-in templates.
- Precise Semantic Understanding: Deep integration with Google Gemini 3.0, capturing fine details, styles, and moods from prompts.
- AI Prompt Optimization: Built-in optimization engine via Gemini / OpenAI standard interfaces.
- Edit History: Supports infinite undo/redo for quick switching between creative ideas.
- Batch Processing: Set up to 100 images for batch generation with background queue processing.
- Real-time Tracking: Clear progress bars and status displays with placeholder cards for each image.
- Multi-Ref Support: Add up to 10 reference images to help the AI understand desired composition or style.
- Reverse Prompt Extraction: Click "Extract Prompt" button on reference images to let AI analyze the image and generate detailed prompts. Supports 20+ languages output.
- Flexible Uploads:
- Click/Drag: Select from local folders or drag-and-drop.
- Clipboard Support: Paste images directly from the web or chat tools.
- Smart Preprocessing: Automatic compression for oversized images and MD5-based duplicate filtering.
- Aspect Ratios: Preset ratios including 1:1, 16:9, 9:16, 4:3, 2:3.
- Quality Settings: Customizable resolution from 1K to 4K.
- Smart Sizing: Automatically aligns image dimensions to 8-pixel boundaries for optimal model performance.
- Interface Switching: Toggle between
Gemini(/v1beta),OpenAI(/v1)multimodal, andOpenAI Image(/v1/images/generations)modes in settings.
- Immersive Preview: Full-screen view with free zooming and dragging.
- High-Density UI: Optimized for productivity with adaptive sidebars and compact info displays.
- Quick Actions:
- Fast Copy: One-click "Copy Image" button in preview for easy pasting into other apps.
- Batch Management: Multi-select images for batch saving or deletion.
- Smart Persistence: Remembers sidebar state, window position, and last-used model configurations.
- Auto Persistence: Real-time saving to local database to prevent data loss.
- Smart Search: Quickly find historical tasks via keywords.
- Stable Connection: Automatically switches between WebSocket and HTTP polling for uninterrupted generation in complex networks.
- Huge Resource: 900+ high-quality templates across various styles and industries.
- Pull-down Access: Interactive "rope" pull-down to open the market.
- Multi-dim Filtering: Filter by Search, Channel, Material, Industry, or Aspect Ratio.
- PPT Category: Dedicated section for 16:9 templates suitable for presentation materials.
- One-click Reuse: Directly apply templates (replaces current Prompt and reference images).
- Manual Sync: Refresh button to pull latest templates manually.
- Source & Tips: Includes usage
tipsand clickablesourcelinks. - Ref Requirements: Displays
minRefsandnotefor required reference images. - Remote Sync: Prioritizes GitHub Raw JSON with local caching.
Template data is maintained in:
backend/internal/templates/assets/templates.json
{
"meta": {
"version": "2024.12.01",
"updated_at": "2024-12-01T12:00:00Z",
"channels": ["Community", "Social", "Xiaohongshu"],
"materials": ["Poster", "PPT", "Cover"],
"industries": ["Education", "Life Services"],
"ratios": ["1:1", "3:4", "16:9"]
},
"items": []
}{
"id": "tpl-001",
"title": "Cat Meme Template",
"channels": ["Community", "Entertainment"],
"materials": ["Poster"],
"industries": ["Life Services"],
"ratio": "1:1",
"preview": "https://.../thumb.jpg",
"image": "https://.../full.jpg",
"prompt": "Optional: Template prompt...",
"prompt_params": "Optional: Prompt usage instructions (reserved)",
"tips": "Optional: Usage tips/tricks",
"source": {
"name": "@Contributor",
"label": "GitHub",
"icon": "github",
"url": "https://example.com/templates/tpl-001"
},
"requirements": { "minRefs": 2, "note": "Requires one cat photo as reference" },
"tags": ["cat", "meme", "funny"]
}requirements.note: Prompt text when reference images are needed.requirements.minRefs: Minimum number of reference images required.tips: Usage tips/notes (displayed in preview).prompt_params: Prompt usage instructions (reserved field, not rendered).tags: For searching and aggregation.materials: Can includePPTtag (suggested for 16:9) for presentation filtering.meta.version / meta.updated_at: For versioning and cache comparison.
github,xhs,wechat,shop,video,print,gov,meme,finance,food,local.
graph TD
subgraph "Frontend Layer (React + Zustand)"
UI[User Interface]
State[Zustand State Management]
AssetProtocol[asset:// Protocol]
end
subgraph "Desktop Container (Tauri 2.0 / Rust)"
TauriBridge[Rust Bridge]
IPC[IPC Optimization]
FS[Local File Access]
end
subgraph "Backend Layer (Go Sidecar)"
GoServer[Gin API Server]
WorkerPool[Worker Pool]
GeminiSDK[Google GenAI SDK]
OpenAIProvider[OpenAI Provider]
OpenAIImageProvider[OpenAI Image Provider]
SQLite[(SQLite Storage)]
end
UI <--> State
State <--> IPC
IPC <--> TauriBridge
TauriBridge <--> GoServer
GoServer <--> WorkerPool
WorkerPool <--> GeminiSDK
WorkerPool <--> OpenAIProvider
WorkerPool <--> OpenAIImageProvider
WorkerPool <--> SQLite
GeminiSDK <--> |Imagen 3.0| Cloud[Google AI Cloud]
OpenAIProvider <--> |/v1/chat/completions| OpenAI[OpenAI Compatible API]
OpenAIImageProvider <--> |/v1/images/generations| OpenAIImg[OpenAI Image API]
GoServer -.-> |Save Images| FS
FS -.-> |Map Resource| AssetProtocol
AssetProtocol -.-> |Fast Display| UI
The project uses a "three-layer architecture" to balance performance and scalability:
- Frontend (React + Zustand): Handles responsive UI and state management.
- Desktop Container (Tauri): Acts as a Rust bridge for window control and local resource access.
- Inference Engine (Go Sidecar): Communicates with AI providers (Gemini, OpenAI, OpenAI-Image) and manages task pools.
- IPC Load Optimization: Only file paths are passed between frontend and backend; large binary data is read directly via the
asset://protocol. - Lifecycle Management: Automatically cleans up Go sidecar processes when Tauri exits.
├── backend/ # Go Backend (Sidecar)
│ ├── cmd/server/ # Entry point
│ └── internal/ # Core logic (Gemini, Worker, DB)
├── desktop/ # Tauri Desktop Project (React + Rust)
│ ├── src/ # Frontend logic
│ └── src-tauri/ # Rust & System permissions
├── frontend/ # Independent Web Frontend (Reference)
└── assets/ # Presentation resources- Go: 1.21+
- Node.js: 18+
- Rust: 1.75+ (Required for Tauri)
If you encounter a "Damaged" error on macOS due to Gatekeeper, run:
sudo xattr -r -d com.apple.quarantine "/Applications/Banana Pro AI.app"cd backend
# Configure config.yaml with your API Key
go run cmd/server/main.goOr use Makefile:
make build # Compile backend
make run # Run backendcd desktop
npm install
npm run tauri devcd frontend
npm install
npm run devPush a version tag to trigger CI:
git tag v2.8.0
git push origin v2.8.0Integrated Tauri Updater for one-click updates.
- Generate keys:
npm run tauri signer generate -- -w ~/.tauri/banana-updater.key - Add public key to
tauri.conf.json. - Configure GitHub Secrets for CI.
| Item | Description |
|---|---|
AI Provider |
gemini (/v1beta), openai (/v1/chat/completions), or openai-image (/v1/images/generations). Each uses its own Base URL and model. |
API Base / Key |
Standard OpenAI format compatibility. |
Image Model |
Primary model for image generation (e.g., gemini-2.0-flash-exp, gpt-4o, gpt-image-2). |
Vision Model |
Model for reverse prompt extraction. Inherits Image Model's Base URL and API Key by default. |
Chat Model |
Model for prompt optimization. |
Storage Dir |
Default to system AppData (Win) or Application Support (Mac). |
Templates Remote URL |
Remote template JSON URL (defaults to GitHub Raw). |
asset:// |
Custom protocol for fast local image access. |
Only for Backend + Web Frontend deployment.
# 1. Copy environment template and configure API Key
cp .env.example .env
nano .env # Add your GEMINI_API_KEY or OPENAI_API_KEY
# 2. Start services (must use docker compose)
docker compose -p banana-pro up -d
# 3. Access the application
# Browser: http://localhost:8090For complete deployment guide, configuration, and troubleshooting, see: DOCKER_DEPLOY.md
- 🐳 Multi-stage Build: Frontend (Node.js) + Backend (Go) + Runtime (Alpine + Nginx)
- 🚀 Environment Auto-Detection: Backend automatically detects Docker and listens on
0.0.0.0(Tauri uses127.0.0.1) - 💾 Data Persistence: Images and database automatically mounted to
./data/storage - 🔄 Health Check: Built-in health endpoint with automatic restart
- 🇨🇳 Mirror Support: Configurable China mirror sources via Build Args
We welcome all forms of contribution!
- Bug Reports: Use GitHub Issues with detailed reproduction steps.
- PRs: Follow existing style and test thoroughly before submitting.
This project is licensed under the MIT License.
- Many templates reuse prompts from awesome-nanobananapro-prompts.
- JSON prompt optimization logic inspired by fofr.


