🎨 Banana Pro AI (Web & Desktop)

English | 简体中文 | 日本語 | 한국어

Banana Pro AI is a high-performance image generation platform designed for creative professionals. It integrates Gemini and OpenAI standard API capabilities, supporting high-resolution (up to 4K) text-to-image and image-to-image generation, available in both Desktop and Web formats.

Note

The application supports English internally and allows language switching in the settings.

🆕 v2.8.0 Updates:

🤖 Dedicated OpenAI Image Generation: New openai-image provider type supporting /v1/images/generations standard API (gpt-image-2 model).

🎨 Image Card Refactor: Smart thumbnail/full-size switching, improved drag-and-drop, better loading experience.

💡 Recommended: For the best generation experience and cost-effectiveness, we recommend using Yunwu API.

Resolution Yunwu API Price Google Official Price (Ref)

1K (1024x1024) ¥0.08 / Image ≈ ¥0.94 / Image

2K (2048x2048) ¥0.08 / Image ≈ ¥0.94 / Image

4K (4096x4096) ¥0.14 / Image ≈ ¥1.68 / Image

Resolution	Yunwu API Price	Google Official Price (Ref)
1K (1024x1024)	¥0.08 / Image	≈ ¥0.94 / Image
2K (2048x2048)	¥0.08 / Image	≈ ¥0.94 / Image
4K (4096x4096)	¥0.14 / Image	≈ ¥1.68 / Image

🌟 Key Features

🚀 Extreme Performance: Built with Tauri 2.0 architecture and a high-concurrency Sidecar backend written in Go, ensuring extremely low resource usage.
🖼️ 4K Ultra-HD Creation: Deeply optimized Gemini 3.0 model, supporting 4K UHD generation across multiple aspect ratios.
🔌 Standard API Compatibility: Supports three provider types: gemini (/v1beta), openai (/v1/chat/completions multimodal), and openai-image (/v1/images/generations) with configurable Base URL and Model ID.
⚡ Custom Protocol (asset://): Registered native resource protocol for desktop, bypassing the HTTP stack to increase local image loading speed by 300%.
💾 Smart History Management: Built-in local database and persistent caching, supporting task recovery and instant opening of large history records.
📸 Precise Image-to-Image: Supports multiple reference images with fine-grained style and composition control.
📦 Automated Delivery: Integrated GitHub Actions for automated packaging and releasing on macOS (Intel/M1) and Windows.
🧩 Template Market: Prioritizes pulling remote template JSON on startup with automatic fallback to built-in templates.

🚀 Functional Details

1. Smart Text-to-Image

Precise Semantic Understanding: Deep integration with Google Gemini 3.0, capturing fine details, styles, and moods from prompts.
AI Prompt Optimization: Built-in optimization engine via Gemini / OpenAI standard interfaces.
Edit History: Supports infinite undo/redo for quick switching between creative ideas.
Batch Processing: Set up to 100 images for batch generation with background queue processing.
Real-time Tracking: Clear progress bars and status displays with placeholder cards for each image.

2. Powerful Image-to-Image

Multi-Ref Support: Add up to 10 reference images to help the AI understand desired composition or style.
Reverse Prompt Extraction: Click "Extract Prompt" button on reference images to let AI analyze the image and generate detailed prompts. Supports 20+ languages output.
Flexible Uploads:
- Click/Drag: Select from local folders or drag-and-drop.
- Clipboard Support: Paste images directly from the web or chat tools.
Smart Preprocessing: Automatic compression for oversized images and MD5-based duplicate filtering.

3. Professional Parameter Control

Aspect Ratios: Preset ratios including 1:1, 16:9, 9:16, 4:3, 2:3.
Quality Settings: Customizable resolution from 1K to 4K.
Smart Sizing: Automatically aligns image dimensions to 8-pixel boundaries for optimal model performance.
Interface Switching: Toggle between Gemini(/v1beta), OpenAI(/v1) multimodal, and OpenAI Image(/v1/images/generations) modes in settings.

4. Advanced UX & Management

Immersive Preview: Full-screen view with free zooming and dragging.
High-Density UI: Optimized for productivity with adaptive sidebars and compact info displays.
Quick Actions:
- Fast Copy: One-click "Copy Image" button in preview for easy pasting into other apps.
- Batch Management: Multi-select images for batch saving or deletion.
Smart Persistence: Remembers sidebar state, window position, and last-used model configurations.

5. Task & History

Auto Persistence: Real-time saving to local database to prevent data loss.
Smart Search: Quickly find historical tasks via keywords.
Stable Connection: Automatically switches between WebSocket and HTTP polling for uninterrupted generation in complex networks.

6. Template Market

Huge Resource: 900+ high-quality templates across various styles and industries.
Pull-down Access: Interactive "rope" pull-down to open the market.
Multi-dim Filtering: Filter by Search, Channel, Material, Industry, or Aspect Ratio.
PPT Category: Dedicated section for 16:9 templates suitable for presentation materials.
One-click Reuse: Directly apply templates (replaces current Prompt and reference images).
Manual Sync: Refresh button to pull latest templates manually.
Source & Tips: Includes usage tips and clickable source links.
Ref Requirements: Displays minRefs and note for required reference images.
Remote Sync: Prioritizes GitHub Raw JSON with local caching.

🧩 Template Contribution Guide

Template data is maintained in:

backend/internal/templates/assets/templates.json

Top-level Structure

{
  "meta": {
    "version": "2024.12.01",
    "updated_at": "2024-12-01T12:00:00Z",
    "channels": ["Community", "Social", "Xiaohongshu"],
    "materials": ["Poster", "PPT", "Cover"],
    "industries": ["Education", "Life Services"],
    "ratios": ["1:1", "3:4", "16:9"]
  },
  "items": []
}

Basic Fields (Single Template)

{
  "id": "tpl-001",
  "title": "Cat Meme Template",
  "channels": ["Community", "Entertainment"],
  "materials": ["Poster"],
  "industries": ["Life Services"],
  "ratio": "1:1",
  "preview": "https://.../thumb.jpg",
  "image": "https://.../full.jpg",
  "prompt": "Optional: Template prompt...",
  "prompt_params": "Optional: Prompt usage instructions (reserved)",
  "tips": "Optional: Usage tips/tricks",
  "source": {
    "name": "@Contributor",
    "label": "GitHub",
    "icon": "github",
    "url": "https://example.com/templates/tpl-001"
  },
  "requirements": { "minRefs": 2, "note": "Requires one cat photo as reference" },
  "tags": ["cat", "meme", "funny"]
}

Field Explanation

requirements.note: Prompt text when reference images are needed.
requirements.minRefs: Minimum number of reference images required.
tips: Usage tips/notes (displayed in preview).
prompt_params: Prompt usage instructions (reserved field, not rendered).
tags: For searching and aggregation.
materials: Can include PPT tag (suggested for 16:9) for presentation filtering.
meta.version / meta.updated_at: For versioning and cache comparison.

source.icon Presets

github, xhs, wechat, shop, video, print, gov, meme, finance, food, local.

🏗️ Technical Architecture

Core Flow

graph TD
    subgraph "Frontend Layer (React + Zustand)"
        UI[User Interface]
        State[Zustand State Management]
        AssetProtocol[asset:// Protocol]
    end

    subgraph "Desktop Container (Tauri 2.0 / Rust)"
        TauriBridge[Rust Bridge]
        IPC[IPC Optimization]
        FS[Local File Access]
    end

    subgraph "Backend Layer (Go Sidecar)"
        GoServer[Gin API Server]
        WorkerPool[Worker Pool]
        GeminiSDK[Google GenAI SDK]
        OpenAIProvider[OpenAI Provider]
        OpenAIImageProvider[OpenAI Image Provider]
        SQLite[(SQLite Storage)]
    end

    UI <--> State
    State <--> IPC
    IPC <--> TauriBridge
    TauriBridge <--> GoServer
    GoServer <--> WorkerPool
    WorkerPool <--> GeminiSDK
    WorkerPool <--> OpenAIProvider
    WorkerPool <--> OpenAIImageProvider
    WorkerPool <--> SQLite
    GeminiSDK <--> |Imagen 3.0| Cloud[Google AI Cloud]
    OpenAIProvider <--> |/v1/chat/completions| OpenAI[OpenAI Compatible API]
    OpenAIImageProvider <--> |/v1/images/generations| OpenAIImg[OpenAI Image API]
    GoServer -.-> |Save Images| FS
    FS -.-> |Map Resource| AssetProtocol
    AssetProtocol -.-> |Fast Display| UI

The project uses a "three-layer architecture" to balance performance and scalability:

Frontend (React + Zustand): Handles responsive UI and state management.
Desktop Container (Tauri): Acts as a Rust bridge for window control and local resource access.
Inference Engine (Go Sidecar): Communicates with AI providers (Gemini, OpenAI, OpenAI-Image) and manages task pools.

Core Optimizations

IPC Load Optimization: Only file paths are passed between frontend and backend; large binary data is read directly via the asset:// protocol.
Lifecycle Management: Automatically cleans up Go sidecar processes when Tauri exits.

📂 Project Structure

├── backend/            # Go Backend (Sidecar)
│   ├── cmd/server/     # Entry point
│   └── internal/       # Core logic (Gemini, Worker, DB)
├── desktop/            # Tauri Desktop Project (React + Rust)
│   ├── src/            # Frontend logic
│   └── src-tauri/      # Rust & System permissions
├── frontend/           # Independent Web Frontend (Reference)
└── assets/             # Presentation resources

💻 Developer Guide

1. Prerequisites

Go: 1.21+
Node.js: 18+
Rust: 1.75+ (Required for Tauri)

🍎 macOS Permission Fix

If you encounter a "Damaged" error on macOS due to Gatekeeper, run:

sudo xattr -r -d com.apple.quarantine "/Applications/Banana Pro AI.app"

2. Backend Setup

cd backend
# Configure config.yaml with your API Key
go run cmd/server/main.go

Or use Makefile:

make build    # Compile backend
make run      # Run backend

3. Desktop Setup

cd desktop
npm install
npm run tauri dev

4. Web Frontend Setup

cd frontend
npm install
npm run dev

5. Automated Build (GitHub Actions)

Push a version tag to trigger CI:

git tag v2.8.0
git push origin v2.8.0

6. Auto Updater

Integrated Tauri Updater for one-click updates.

Generate keys: npm run tauri signer generate -- -w ~/.tauri/banana-updater.key
Add public key to tauri.conf.json.
Configure GitHub Secrets for CI.

⚙️ Core Configuration

Item	Description
`AI Provider`	`gemini` (/v1beta), `openai` (/v1/chat/completions), or `openai-image` (/v1/images/generations). Each uses its own Base URL and model.
`API Base / Key`	Standard OpenAI format compatibility.
`Image Model`	Primary model for image generation (e.g., gemini-2.0-flash-exp, gpt-4o, gpt-image-2).
`Vision Model`	Model for reverse prompt extraction. Inherits Image Model's Base URL and API Key by default.
`Chat Model`	Model for prompt optimization.
`Storage Dir`	Default to system `AppData` (Win) or `Application Support` (Mac).
`Templates Remote URL`	Remote template JSON URL (defaults to GitHub Raw).
`asset://`	Custom protocol for fast local image access.

🐳 Docker Deployment (Web)

Only for Backend + Web Frontend deployment.

Quick Start

# 1. Copy environment template and configure API Key
cp .env.example .env
nano .env  # Add your GEMINI_API_KEY or OPENAI_API_KEY

# 2. Start services (must use docker compose)
docker compose -p banana-pro up -d

# 3. Access the application
# Browser: http://localhost:8090

Detailed Documentation

For complete deployment guide, configuration, and troubleshooting, see: DOCKER_DEPLOY.md

Key Features

🐳 Multi-stage Build: Frontend (Node.js) + Backend (Go) + Runtime (Alpine + Nginx)
🚀 Environment Auto-Detection: Backend automatically detects Docker and listens on 0.0.0.0 (Tauri uses 127.0.0.1)
💾 Data Persistence: Images and database automatically mounted to ./data/storage
🔄 Health Check: Built-in health endpoint with automatic restart
🇨🇳 Mirror Support: Configurable China mirror sources via Build Args

🤝 Contribution & Feedback

We welcome all forms of contribution!

Bug Reports: Use GitHub Issues with detailed reproduction steps.
PRs: Follow existing style and test thoroughly before submitting.

📄 License

This project is licensed under the MIT License.

📈 Star History

🙏 Special Thanks

Many templates reuse prompts from awesome-nanobananapro-prompts.
JSON prompt optimization logic inspired by fofr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎨 Banana Pro AI (Web & Desktop)

🌟 Key Features

🚀 Functional Details

1. Smart Text-to-Image

2. Powerful Image-to-Image

3. Professional Parameter Control

4. Advanced UX & Management

5. Task & History

6. Template Market

🧩 Template Contribution Guide

Top-level Structure

Basic Fields (Single Template)

Field Explanation

source.icon Presets

🏗️ Technical Architecture

Core Flow

Core Optimizations

📂 Project Structure

💻 Developer Guide

1. Prerequisites

🍎 macOS Permission Fix

2. Backend Setup

3. Desktop Setup

4. Web Frontend Setup

5. Automated Build (GitHub Actions)

6. Auto Updater

⚙️ Core Configuration

🐳 Docker Deployment (Web)

Quick Start

Detailed Documentation

Key Features

🤝 Contribution & Feedback

📄 License

📈 Star History

🙏 Special Thanks

FilesExpand file tree

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

🎨 Banana Pro AI (Web & Desktop)

🌟 Key Features

🚀 Functional Details

1. Smart Text-to-Image

2. Powerful Image-to-Image

3. Professional Parameter Control

4. Advanced UX & Management

5. Task & History

6. Template Market

🧩 Template Contribution Guide

Top-level Structure

Basic Fields (Single Template)

Field Explanation

source.icon Presets

🏗️ Technical Architecture

Core Flow

Core Optimizations

📂 Project Structure

💻 Developer Guide

1. Prerequisites

🍎 macOS Permission Fix

2. Backend Setup

3. Desktop Setup

4. Web Frontend Setup

5. Automated Build (GitHub Actions)

6. Auto Updater

⚙️ Core Configuration

🐳 Docker Deployment (Web)

Quick Start

Detailed Documentation

Key Features

🤝 Contribution & Feedback

📄 License

📈 Star History

🙏 Special Thanks