Agent Instructions for Factory Feed Viewer

This document provides best practices and guidelines for AI agents working on this repository.

Repository Overview

Purpose: A social media feed aggregator that scrapes Twitter, Reddit, and GitHub for content related to Factory AI. The application runs on GitHub Pages with GitHub Actions handling backend scraping.

Architecture:

Frontend: Static HTML/CSS/JS served by GitHub Pages
Backend: Node.js scrapers run via GitHub Actions every 10 minutes
Data: Static JSON file (public/data/feed.json) updated by Actions
Access Control: SHA-256 token gate for privacy

Security Best Practices

🔒 CRITICAL: Never Commit Secrets

Files that must NEVER be committed:

.env (contains API keys and tokens)
Any file with actual credentials
data/*.json (may contain sensitive information)

Before any commit:

Run git status to review all files being added
Run git diff --cached to review content changes
Search for patterns: grep -r "github_pat\|apify_api\|slack_webhook" .
Verify .gitignore includes .env and data/*.json

Current secrets in use:

GH_PAT: GitHub Personal Access Token for API access
GH_REPO: GitHub repository to track (format: owner/repo)
APIFY_TOKEN: Apify API token for Twitter scraping
SLACK_WEBHOOK_URL: Slack incoming webhook (optional)
TEAM_TWITTER_USERNAMES: Usernames to filter out

All secrets stored in:

Local development: .env file (gitignored) - uses GITHUB_TOKEN and GITHUB_REPO variable names
GitHub Actions: Repository Secrets (Settings > Secrets and variables > Actions) - uses GH_PAT and GH_REPO names
Frontend access: SHA-256 hash in code (not the actual token)

⚠️ GitHub Secret Naming Restriction: GitHub does not allow repository secrets to contain the word GITHUB in their name. That's why we use:

GH_PAT in GitHub Secrets (for Actions) → mapped to GITHUB_TOKEN environment variable
GH_REPO in GitHub Secrets (for Actions) → mapped to GITHUB_REPO environment variable
GITHUB_TOKEN and GITHUB_REPO in local .env file (for development)

🔐 Access Token Management

The frontend uses a SHA-256 hash to validate access tokens. To generate a new access token hash:

echo -n "your_chosen_password" | shasum -a 256

Update the ACCESS_TOKEN_HASH constant in public/index.html with the output.

Repository Structure

./
├── .github/
│   └── workflows/
│       └── scrape-feeds.yml    # GitHub Action (runs every 10 min)
├── src/
│   ├── index.js                # Express server (for local dev only)
│   ├── scraper-cli.js          # CLI entry point for GitHub Actions
│   ├── storage.js              # Data persistence logic
│   ├── slack.js                # Slack integration
│   └── scrapers/
│       ├── reddit.js           # Reddit scraper
│       ├── twitter.js          # Twitter scraper (via Apify)
│       └── github.js           # GitHub GraphQL scraper
├── docs/                       # Renamed from 'public' for GitHub Pages
│   ├── index.html              # Main frontend (with access gate)
│   ├── config.json             # Default feed configuration
│   └── data/
│       └── feed.json          # Generated by Actions (gitignored)
├── data/
│   ├── feed.json              # Local development feed cache
│   └── seen.json              # Deduplication tracking
├── .env                        # Local secrets (NEVER commit)
├── .env.example               # Template for required env vars
├── .gitignore                 # Excludes .env, data/*.json, etc.
├── AGENTS.md                  # This file
└── README.md                  # User-facing documentation

Development Workflow

Local Development

Setup:

cp .env.example .env
# Edit .env and add your actual API keys
npm install

Run locally:

npm start
# Server runs on http://localhost:3000
# Access the frontend and test API endpoints

Test scraping:

node src/scraper-cli.js
# Runs scrapers once and outputs to public/data/feed.json

Deployment Workflow

Make code changes (scrapers, frontend, etc.)
Test locally first:
```
npm start
node src/scraper-cli.js
```

Security check before commit:

git status
git diff --cached
grep -r "github_pat\|apify_api\|slack" . --exclude-dir=node_modules

Commit and push:

git add .
git commit -m "Description of changes"
git push origin main

Verify GitHub Action:
- Go to GitHub repo > Actions tab
- Check that workflow runs successfully
- Verify public/data/feed.json is updated
Check deployed site:
- Visit GitHub Pages URL
- Enter access token
- Verify feed loads correctly

Adding/Updating GitHub Secrets

When API keys change or new ones are added:

# Using gh CLI (remember: GitHub doesn't allow "GITHUB" in secret names)
gh secret set GH_PAT              # NOT GITHUB_PAT
gh secret set APIFY_TOKEN
gh secret set SLACK_WEBHOOK_URL
gh secret set TEAM_TWITTER_USERNAMES

Or manually:

Go to GitHub repo Settings > Secrets and variables > Actions
Click "New repository secret"
Add name and value (⚠️ Cannot include "GITHUB" in the name)
Update .env.example to document the new variable

Important: Secret names cannot contain GITHUB. Use alternatives like GH_* instead.

Modifying Feed Sources

Adding a New Twitter Search Term

Option 1: Via Frontend (User-facing)

Click settings icon (⚙️)
Edit JSON configuration
Add to twitter.searchTerms array
Save

Option 2: Via Code (Affects all users)

Edit public/config.json
Add to twitter.searchTerms array
Commit and push

Adding a New Reddit Source

Edit src/scrapers/reddit.js:

const redditUrls = [
  'https://www.reddit.com/search/?q=factoryai&type=link&sort=new',
  'https://www.reddit.com/r/YourNewSubreddit/search/?q=yourterm&restrict_sr=1&sort=new'
];

Adding a New GitHub User/Org to Follow

Edit src/scrapers/github.js:

const usernames = ['anthropics', 'vercel', 'openai', 'yournewuser'];

Creating a New Scraper

Create src/scrapers/newsource.js:

async function scrapeNewSource() {
  const items = []; // Scrape logic here
  
  return items.map(item => ({
    id: `newsource_${item.uniqueId}`,
    source: 'newsource',
    author: item.author,
    content: item.text,
    url: item.link,
    timestamp: item.date,
    metadata: { /* source-specific data */ }
  }));
}

module.exports = { scrapeNewSource };

Update src/scraper-cli.js:

const { scrapeNewSource } = require('./scrapers/newsource');

const results = await Promise.allSettled([
  scrapeReddit(),
  scrapeGitHub(),
  scrapeTwitter(),
  scrapeNewSource() // Add here
]);

Update frontend in public/index.html:
- Add source icon and name to sourceNames and sourceIcons
- Add filter pill button
- Add column rendering logic

API Rate Limits & Considerations

Service	Rate Limit	Notes
GitHub	5,000 req/hour (authenticated)	Uses GraphQL; efficient
Reddit	~60 req/min (unauthenticated)	Public RSS/JSON feeds
Twitter	Via Apify (paid)	Check Apify usage dashboard
Slack	~1 req/sec per webhook	Only used for manual sends

GitHub Actions limits:

2,000 minutes/month (free tier)
Each run ~1-2 minutes
Running every 10 min = ~4,300 runs/month (exceeds free tier)
Recommendation: Adjust to every 15-30 minutes for free tier

To change frequency, edit .github/workflows/scrape-feeds.yml:

schedule:
  - cron: '*/30 * * * *'  # Every 30 minutes instead of 10

Testing

Manual Testing Checklist

Before pushing changes:

Automated Tests

Currently no automated tests. To add:

Create tests/ directory
Add Jest or Mocha
Write unit tests for scrapers
Write integration tests for feed aggregation
Add to package.json: "test": "jest"
Run npm test before commits

Common Issues & Solutions

Issue: GitHub Action fails with "Permission denied"

Solution: Ensure GITHUB_TOKEN secret is set with correct permissions.

gh secret set GITHUB_PAT
# Paste your token (must have repo read/write permissions)

Issue: Feed doesn't update on GitHub Pages

Solutions:

Check GitHub Actions tab for errors
Verify public/data/feed.json exists in repo
Clear browser cache (GitHub Pages caches aggressively)
Check Pages settings: Settings > Pages > Build from main branch

Issue: "Access Denied" on frontend

Solutions:

Generate correct token hash: echo -n "password" | shasum -a 256
Update ACCESS_TOKEN_HASH in public/index.html
Or clear localStorage and re-enter token

Issue: Twitter scraping fails

Solutions:

Check Apify token is valid: https://console.apify.com/
Verify token has sufficient credits
Check Apify Actor is still available (they sometimes deprecate)
Consider alternative: Nitter instances or Twitter API v2

Issue: Too many API requests

Solutions:

Reduce scraping frequency in workflow
Add caching layer (check timestamps before fetching)
Reduce number of sources being scraped
Use If-Modified-Since headers where supported

Deployment Checklist

When deploying to a new environment:

Create private GitHub repository
Add all secrets to repository settings
Verify .gitignore excludes .env and data/*.json
Push code to main branch
Enable GitHub Pages (Settings > Pages > Source: main branch /public)
Manually trigger workflow to verify it works
Visit GitHub Pages URL and test access
Generate and share access token with authorized users
Set up monitoring (check Actions tab regularly)

Monitoring & Maintenance

Regular checks (weekly):

Visit GitHub Actions tab, verify recent runs succeeded
Check GitHub Pages site loads correctly
Verify feed data is fresh (timestamps are recent)
Review API usage (GitHub, Apify dashboards)
Check for security alerts (Dependabot)

Monthly maintenance:

Update dependencies: npm update
Review and clean old feed data if growing large
Audit access logs if needed
Rotate access tokens if compromised

Additional Resources

GitHub Actions Docs: https://docs.github.com/en/actions
GitHub Pages Docs: https://docs.github.com/en/pages
Apify Docs: https://docs.apify.com/
GitHub GraphQL Explorer: https://docs.github.com/en/graphql/overview/explorer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Instructions for Factory Feed Viewer

Repository Overview

Security Best Practices

🔒 CRITICAL: Never Commit Secrets

🔐 Access Token Management

Repository Structure

Development Workflow

Local Development

Deployment Workflow

Adding/Updating GitHub Secrets

Modifying Feed Sources

Adding a New Twitter Search Term

Adding a New Reddit Source

Adding a New GitHub User/Org to Follow

Creating a New Scraper

API Rate Limits & Considerations

Testing

Manual Testing Checklist

Automated Tests

Common Issues & Solutions

Issue: GitHub Action fails with "Permission denied"

Issue: Feed doesn't update on GitHub Pages

Issue: "Access Denied" on frontend

Issue: Twitter scraping fails

Issue: Too many API requests

Deployment Checklist

Monitoring & Maintenance

Additional Resources

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Agent Instructions for Factory Feed Viewer

Repository Overview

Security Best Practices

🔒 CRITICAL: Never Commit Secrets

🔐 Access Token Management

Repository Structure

Development Workflow

Local Development

Deployment Workflow

Adding/Updating GitHub Secrets

Modifying Feed Sources

Adding a New Twitter Search Term

Adding a New Reddit Source

Adding a New GitHub User/Org to Follow

Creating a New Scraper

API Rate Limits & Considerations

Testing

Manual Testing Checklist

Automated Tests

Common Issues & Solutions

Issue: GitHub Action fails with "Permission denied"

Issue: Feed doesn't update on GitHub Pages

Issue: "Access Denied" on frontend

Issue: Twitter scraping fails

Issue: Too many API requests

Deployment Checklist

Monitoring & Maintenance

Additional Resources