This document provides best practices and guidelines for AI agents working on this repository.
Purpose: A social media feed aggregator that scrapes Twitter, Reddit, and GitHub for content related to Factory AI. The application runs on GitHub Pages with GitHub Actions handling backend scraping.
Architecture:
- Frontend: Static HTML/CSS/JS served by GitHub Pages
- Backend: Node.js scrapers run via GitHub Actions every 10 minutes
- Data: Static JSON file (
public/data/feed.json) updated by Actions - Access Control: SHA-256 token gate for privacy
Files that must NEVER be committed:
.env(contains API keys and tokens)- Any file with actual credentials
data/*.json(may contain sensitive information)
Before any commit:
- Run
git statusto review all files being added - Run
git diff --cachedto review content changes - Search for patterns:
grep -r "github_pat\|apify_api\|slack_webhook" . - Verify
.gitignoreincludes.envanddata/*.json
Current secrets in use:
GH_PAT: GitHub Personal Access Token for API accessGH_REPO: GitHub repository to track (format:owner/repo)APIFY_TOKEN: Apify API token for Twitter scrapingSLACK_WEBHOOK_URL: Slack incoming webhook (optional)TEAM_TWITTER_USERNAMES: Usernames to filter out
All secrets stored in:
- Local development:
.envfile (gitignored) - usesGITHUB_TOKENandGITHUB_REPOvariable names - GitHub Actions: Repository Secrets (Settings > Secrets and variables > Actions) - uses
GH_PATandGH_REPOnames - Frontend access: SHA-256 hash in code (not the actual token)
GITHUB in their name. That's why we use:
GH_PATin GitHub Secrets (for Actions) β mapped toGITHUB_TOKENenvironment variableGH_REPOin GitHub Secrets (for Actions) β mapped toGITHUB_REPOenvironment variableGITHUB_TOKENandGITHUB_REPOin local.envfile (for development)
The frontend uses a SHA-256 hash to validate access tokens. To generate a new access token hash:
echo -n "your_chosen_password" | shasum -a 256Update the ACCESS_TOKEN_HASH constant in public/index.html with the output.
./
βββ .github/
β βββ workflows/
β βββ scrape-feeds.yml # GitHub Action (runs every 10 min)
βββ src/
β βββ index.js # Express server (for local dev only)
β βββ scraper-cli.js # CLI entry point for GitHub Actions
β βββ storage.js # Data persistence logic
β βββ slack.js # Slack integration
β βββ scrapers/
β βββ reddit.js # Reddit scraper
β βββ twitter.js # Twitter scraper (via Apify)
β βββ github.js # GitHub GraphQL scraper
βββ docs/ # Renamed from 'public' for GitHub Pages
β βββ index.html # Main frontend (with access gate)
β βββ config.json # Default feed configuration
β βββ data/
β βββ feed.json # Generated by Actions (gitignored)
βββ data/
β βββ feed.json # Local development feed cache
β βββ seen.json # Deduplication tracking
βββ .env # Local secrets (NEVER commit)
βββ .env.example # Template for required env vars
βββ .gitignore # Excludes .env, data/*.json, etc.
βββ AGENTS.md # This file
βββ README.md # User-facing documentation
-
Setup:
cp .env.example .env # Edit .env and add your actual API keys npm install -
Run locally:
npm start # Server runs on http://localhost:3000 # Access the frontend and test API endpoints
-
Test scraping:
node src/scraper-cli.js # Runs scrapers once and outputs to public/data/feed.json
-
Make code changes (scrapers, frontend, etc.)
-
Test locally first:
npm start node src/scraper-cli.js
-
Security check before commit:
git status git diff --cached grep -r "github_pat\|apify_api\|slack" . --exclude-dir=node_modules
-
Commit and push:
git add . git commit -m "Description of changes" git push origin main
-
Verify GitHub Action:
- Go to GitHub repo > Actions tab
- Check that workflow runs successfully
- Verify
public/data/feed.jsonis updated
-
Check deployed site:
- Visit GitHub Pages URL
- Enter access token
- Verify feed loads correctly
When API keys change or new ones are added:
# Using gh CLI (remember: GitHub doesn't allow "GITHUB" in secret names)
gh secret set GH_PAT # NOT GITHUB_PAT
gh secret set APIFY_TOKEN
gh secret set SLACK_WEBHOOK_URL
gh secret set TEAM_TWITTER_USERNAMESOr manually:
- Go to GitHub repo Settings > Secrets and variables > Actions
- Click "New repository secret"
- Add name and value (
β οΈ Cannot include "GITHUB" in the name) - Update
.env.exampleto document the new variable
Important: Secret names cannot contain GITHUB. Use alternatives like GH_* instead.
Option 1: Via Frontend (User-facing)
- Click settings icon (βοΈ)
- Edit JSON configuration
- Add to
twitter.searchTermsarray - Save
Option 2: Via Code (Affects all users)
- Edit
public/config.json - Add to
twitter.searchTermsarray - Commit and push
Edit src/scrapers/reddit.js:
const redditUrls = [
'https://www.reddit.com/search/?q=factoryai&type=link&sort=new',
'https://www.reddit.com/r/YourNewSubreddit/search/?q=yourterm&restrict_sr=1&sort=new'
];Edit src/scrapers/github.js:
const usernames = ['anthropics', 'vercel', 'openai', 'yournewuser'];-
Create
src/scrapers/newsource.js:async function scrapeNewSource() { const items = []; // Scrape logic here return items.map(item => ({ id: `newsource_${item.uniqueId}`, source: 'newsource', author: item.author, content: item.text, url: item.link, timestamp: item.date, metadata: { /* source-specific data */ } })); } module.exports = { scrapeNewSource };
-
Update
src/scraper-cli.js:const { scrapeNewSource } = require('./scrapers/newsource'); const results = await Promise.allSettled([ scrapeReddit(), scrapeGitHub(), scrapeTwitter(), scrapeNewSource() // Add here ]);
-
Update frontend in
public/index.html:- Add source icon and name to
sourceNamesandsourceIcons - Add filter pill button
- Add column rendering logic
- Add source icon and name to
| Service | Rate Limit | Notes |
|---|---|---|
| GitHub | 5,000 req/hour (authenticated) | Uses GraphQL; efficient |
| ~60 req/min (unauthenticated) | Public RSS/JSON feeds | |
| Via Apify (paid) | Check Apify usage dashboard | |
| Slack | ~1 req/sec per webhook | Only used for manual sends |
GitHub Actions limits:
- 2,000 minutes/month (free tier)
- Each run ~1-2 minutes
- Running every 10 min = ~4,300 runs/month (exceeds free tier)
- Recommendation: Adjust to every 15-30 minutes for free tier
To change frequency, edit .github/workflows/scrape-feeds.yml:
schedule:
- cron: '*/30 * * * *' # Every 30 minutes instead of 10Before pushing changes:
- Run
npm startand verify server starts - Visit
http://localhost:3000and verify feed loads - Test access token gate (clear localStorage and reload)
- Test all source filters (Twitter, Reddit, GitHub)
- Test time filters (1h, 12h, 24h, custom)
- Test sort toggle (newest/oldest)
- Test selection (click to select, Cmd+click to open)
- Test keyboard shortcuts (J/K navigation, A archive, etc.)
- Test command palette (Cmd+K)
- Test settings modal (edit configuration)
- Run
node src/scraper-cli.jsand verify no errors - Check
public/data/feed.jsonis created/updated - Verify no secrets in
git diff
Currently no automated tests. To add:
- Create
tests/directory - Add Jest or Mocha
- Write unit tests for scrapers
- Write integration tests for feed aggregation
- Add to
package.json:"test": "jest" - Run
npm testbefore commits
Solution: Ensure GITHUB_TOKEN secret is set with correct permissions.
gh secret set GITHUB_PAT
# Paste your token (must have repo read/write permissions)Solutions:
- Check GitHub Actions tab for errors
- Verify
public/data/feed.jsonexists in repo - Clear browser cache (GitHub Pages caches aggressively)
- Check Pages settings: Settings > Pages > Build from
mainbranch
Solutions:
- Generate correct token hash:
echo -n "password" | shasum -a 256 - Update
ACCESS_TOKEN_HASHinpublic/index.html - Or clear localStorage and re-enter token
Solutions:
- Check Apify token is valid: https://console.apify.com/
- Verify token has sufficient credits
- Check Apify Actor is still available (they sometimes deprecate)
- Consider alternative: Nitter instances or Twitter API v2
Solutions:
- Reduce scraping frequency in workflow
- Add caching layer (check timestamps before fetching)
- Reduce number of sources being scraped
- Use
If-Modified-Sinceheaders where supported
When deploying to a new environment:
- Create private GitHub repository
- Add all secrets to repository settings
- Verify
.gitignoreexcludes.envanddata/*.json - Push code to
mainbranch - Enable GitHub Pages (Settings > Pages > Source:
mainbranch/public) - Manually trigger workflow to verify it works
- Visit GitHub Pages URL and test access
- Generate and share access token with authorized users
- Set up monitoring (check Actions tab regularly)
Regular checks (weekly):
- Visit GitHub Actions tab, verify recent runs succeeded
- Check GitHub Pages site loads correctly
- Verify feed data is fresh (timestamps are recent)
- Review API usage (GitHub, Apify dashboards)
- Check for security alerts (Dependabot)
Monthly maintenance:
- Update dependencies:
npm update - Review and clean old feed data if growing large
- Audit access logs if needed
- Rotate access tokens if compromised
- GitHub Actions Docs: https://docs.github.com/en/actions
- GitHub Pages Docs: https://docs.github.com/en/pages
- Apify Docs: https://docs.apify.com/
- GitHub GraphQL Explorer: https://docs.github.com/en/graphql/overview/explorer