Build a Reddit app using Devvit that allows users to upload documents (images and PDFs), perform OCR/AI analysis, and manage their documents within Reddit posts.
We started with ambitious goals:
- Use AWS Textract for professional OCR
- Store documents in S3 for scalability
- Use PostgreSQL (Neon) for metadata
- Build a production-ready document management system
AWS Textract Problems:
- ❌ Requires AWS SDK with native dependencies
- ❌ Devvit's serverless environment doesn't support native Node.js packages
- ❌ No filesystem access for temporary file storage
- ❌ Complex authentication requirements incompatible with Devvit
Verdict: AWS Textract was completely incompatible with Devvit's architecture.
- ✅ Pure JavaScript SDK (no native dependencies)
- ✅ Supports both image and PDF analysis
- ✅ Vision capabilities for document understanding
- ✅ Works with Devvit's HTTP fetch (generativelanguage.googleapis.com is globally allow-listed)
Devvit's Strange Approach:
- Devvit does not support environment variables in the traditional sense
- No
.envfile support in production - No way to inject secrets at build time
- The only option: hardcode API keys in code or store in Redis
Our Solution:
const apiKey = await redis.get('config:gemini_api_key') || 'HARDCODED_FALLBACK';This is a terrible security practice but it's the only way Devvit allows external API integration. The API key is visible in the source code and must be manually set in Redis after deployment.
- S3 for large files
- PostgreSQL for metadata
- Presigned URLs for access
- AWS SDK Issues: Same native dependency problems as Textract
- PostgreSQL Complexity: Connection pooling doesn't work in serverless
- Devvit Fetch Limitations: Custom domains require Reddit approval (1-3 days)
Why Redis Won:
- ✅ Built into Devvit (no external dependencies)
- ✅ No authentication needed
- ✅ Works immediately
- ✅ Simple key-value storage
- ❌ 500KB file size limit (acceptable for compressed images and small PDFs)
Storage Strategy:
// Store base64-encoded files directly in Redis
await redis.set(`docs:${postId}:list`, JSON.stringify(documents));Documents are stored as base64 strings in Redis with 7-day cache for AI analysis results.
Implementing a custom splash screen took way longer than expected due to Devvit's URL handling quirks.
Issue: Devvit was appending extra parameters to asset URLs, breaking image loading:
Expected: /assets/splash.png
Actual: /assets/splash.png?devvit_post_id=xyz&extra_params=...
Solution: Had to carefully handle URL construction and test extensively to get images loading correctly.
The Devvit Build Experience:
- Sometimes builds complete in 10 seconds ⚡
- Sometimes they take 2+ minutes 🐌
- Sometimes they fail for no reason 💥
- Sometimes they work perfectly after a retry 🤷
No consistency, no clear error messages, just chaos.
After getting everything else working, the Gemini API kept returning 404 errors. This turned into a multi-hour debugging nightmare.
Attempt 1: gemini-1.5-flash-latest
❌ Error: models/gemini-1.5-flash-latest is not found for API version v1beta
Attempt 2: gemini-1.5-flash
❌ Error: models/gemini-1.5-flash is not found for API version v1beta
Attempt 3: models/gemini-1.5-flash (with prefix)
❌ Error: models/gemini-1.5-flash is not found for API version v1beta
Attempt 4: Switch to v1 API
❌ Error: models/gemini-1.5-flash is not found for API version v1
Attempt 5: gemini-1.5-flash-8b
❌ Error: models/gemini-1.5-flash-8b is not found for API version v1
Checked Google AI Studio documentation and found the actual model name: gemini-2.5-flash
const model = genAI.getGenerativeModel({
model: 'gemini-2.5-flash',
generationConfig: {
temperature: 0.4,
maxOutputTokens: 1000,
},
}, {
apiVersion: 'v1',
});But wait, there's more! 🎭
The model started working but returned empty strings:
{"textLength":0,"textPreview":"","hasText":false}Root Cause: maxOutputTokens: 200 was too small!
- The model was generating responses
- They were being truncated to nothing
finishReason: "MAX_TOKENS"revealed the issue
Solution: Increased to maxOutputTokens: 1000
Model returned plain text instead of JSON:
"The provided image is an aerial view of a construction site..."
Solution: Rewrote prompt to force JSON output:
const prompt = `You must respond with ONLY valid JSON, no other text.
Analyze this image and return JSON with this exact structure:
{
"description": "brief description (max 80 chars)",
"summary": "key details and information"
}
Respond with ONLY the JSON object, nothing else.`;A fully functional document management app for Reddit that:
✅ Uploads images and PDFs (despite Reddit docs saying only images are supported) ✅ AI-powered analysis using Gemini 2.5 Flash ✅ Auto-generates descriptions from document content ✅ Extracts key information (amounts, dates, companies) ✅ Stores documents in Redis (up to 20 per post, 500KB each) ✅ 7-day caching for AI results (cost optimization) ✅ Rate limiting (100 requests/day per user) ✅ Mobile-friendly UI with Tailwind CSS ✅ Works entirely within Reddit posts
┌─────────────────────────────────────────┐
│ Reddit Post (Devvit) │
├─────────────────────────────────────────┤
│ Client (React) │
│ ├─ DocumentUploader.tsx │
│ ├─ DocumentsList.tsx │
│ └─ File compression & preview │
├─────────────────────────────────────────┤
│ Server (Express) │
│ ├─ /api/analyze (Gemini AI) │
│ ├─ /api/documents/add │
│ ├─ /api/documents/list │
│ └─ /api/documents/delete │
├─────────────────────────────────────────┤
│ Storage (Redis) │
│ ├─ docs:{postId}:list (documents) │
│ ├─ analysis:{hash} (AI cache) │
│ └─ ratelimit:{user}:{date} (limits) │
├─────────────────────────────────────────┤
│ External APIs │
│ └─ Google Gemini 2.5 Flash (v1 API) │
└─────────────────────────────────────────┘
-
PDF Support in Reddit 📄
- Reddit's documentation says only images can be uploaded
- We proved PDFs work by encoding them as base64 and using Gemini's PDF analysis
-
Efficient Caching 💾
- MD5 hash-based cache keys
- 7-day TTL for AI results
- Duplicate uploads use cached analysis (free)
-
Cost Optimization 💰
- Gemini 2.5 Flash: ~$0.0002 per analysis
- Monthly cost: ~$0.40-$0.67 for 100 requests/day
- Essentially free with caching
-
Rate Limiting 🚦
- 100 requests/day per user
- Resets at midnight UTC
- Prevents API abuse
-
Mobile-First Design 📱
- Touch-optimized file input
- Responsive layout
- Works in Reddit mobile app
The Good:
- ✅ React integration works well
- ✅ Redis is fast and reliable
- ✅ Hot reload during development
- ✅ Automatic Reddit authentication
The Bad:
- ❌ No environment variable support (security nightmare)
- ❌ No native Node.js packages (kills AWS SDK)
- ❌ Erratic build times (10s to 2+ minutes)
- ❌ Poor error messages
- ❌ URL handling quirks with assets
The Ugly:
- 😤 Must hardcode API keys in source code
- 😤 Build failures with no explanation
- 😤 Documentation gaps (PDF support not mentioned)
- 😤 No way to test external APIs locally
What Worked:
- Gemini 2.5 Flash is fast and accurate
- Vision API handles both images and PDFs
- JSON mode works with proper prompting
- Cost is negligible with caching
What Didn't:
- Model names are confusing and poorly documented
- Token limits need careful tuning
- Empty responses are hard to debug
- API version compatibility is unclear
Surprising Discoveries:
- PDFs work despite documentation saying otherwise
- Redis is more capable than expected
- Mobile support is better than anticipated
- Community will find creative uses we didn't expect
- Development Time: ~2 weeks (with many frustrating hours)
- Lines of Code: ~2,500
- API Integrations: 1 (Gemini)
- Storage Solutions Attempted: 3 (S3, PostgreSQL, Redis)
- Model Names Tried: 6+
- Build Failures: Too many to count
- Final Result: ✅ Working OCR app on Reddit
Potential improvements:
- Batch document analysis
- Export to CSV/Excel
- Document search functionality
- OCR text editing
- Multi-language support
- Document categorization
- Receipt total calculations
- Integration with accounting software
Building on Devvit is like solving a puzzle where half the pieces are missing and the box has the wrong picture. But when it works, it's pretty cool to have an AI-powered document manager running inside Reddit posts.
The biggest lesson: Don't fight the platform. We tried AWS, PostgreSQL, and complex architectures. What worked was embracing Devvit's limitations and using Redis for everything.
Also: Google's model naming is chaos. 🤷
Built with: React, TypeScript, Express, Redis, Google Gemini AI, Tailwind CSS, and a lot of patience.
Deployed on: Reddit via Devvit
Status: ✅ Production-ready (with hardcoded API keys because Devvit)