Skip to content

Latest commit

Β 

History

History
364 lines (285 loc) Β· 11.4 KB

File metadata and controls

364 lines (285 loc) Β· 11.4 KB

Agent Instructions for Factory Feed Viewer

This document provides best practices and guidelines for AI agents working on this repository.

Repository Overview

Purpose: A social media feed aggregator that scrapes Twitter, Reddit, and GitHub for content related to Factory AI. The application runs on GitHub Pages with GitHub Actions handling backend scraping.

Architecture:

  • Frontend: Static HTML/CSS/JS served by GitHub Pages
  • Backend: Node.js scrapers run via GitHub Actions every 10 minutes
  • Data: Static JSON file (public/data/feed.json) updated by Actions
  • Access Control: SHA-256 token gate for privacy

Security Best Practices

πŸ”’ CRITICAL: Never Commit Secrets

Files that must NEVER be committed:

  • .env (contains API keys and tokens)
  • Any file with actual credentials
  • data/*.json (may contain sensitive information)

Before any commit:

  1. Run git status to review all files being added
  2. Run git diff --cached to review content changes
  3. Search for patterns: grep -r "github_pat\|apify_api\|slack_webhook" .
  4. Verify .gitignore includes .env and data/*.json

Current secrets in use:

  • GH_PAT: GitHub Personal Access Token for API access
  • GH_REPO: GitHub repository to track (format: owner/repo)
  • APIFY_TOKEN: Apify API token for Twitter scraping
  • SLACK_WEBHOOK_URL: Slack incoming webhook (optional)
  • TEAM_TWITTER_USERNAMES: Usernames to filter out

All secrets stored in:

  • Local development: .env file (gitignored) - uses GITHUB_TOKEN and GITHUB_REPO variable names
  • GitHub Actions: Repository Secrets (Settings > Secrets and variables > Actions) - uses GH_PAT and GH_REPO names
  • Frontend access: SHA-256 hash in code (not the actual token)

⚠️ GitHub Secret Naming Restriction: GitHub does not allow repository secrets to contain the word GITHUB in their name. That's why we use:

  • GH_PAT in GitHub Secrets (for Actions) β†’ mapped to GITHUB_TOKEN environment variable
  • GH_REPO in GitHub Secrets (for Actions) β†’ mapped to GITHUB_REPO environment variable
  • GITHUB_TOKEN and GITHUB_REPO in local .env file (for development)

πŸ” Access Token Management

The frontend uses a SHA-256 hash to validate access tokens. To generate a new access token hash:

echo -n "your_chosen_password" | shasum -a 256

Update the ACCESS_TOKEN_HASH constant in public/index.html with the output.

Repository Structure

./
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── scrape-feeds.yml    # GitHub Action (runs every 10 min)
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ index.js                # Express server (for local dev only)
β”‚   β”œβ”€β”€ scraper-cli.js          # CLI entry point for GitHub Actions
β”‚   β”œβ”€β”€ storage.js              # Data persistence logic
β”‚   β”œβ”€β”€ slack.js                # Slack integration
β”‚   └── scrapers/
β”‚       β”œβ”€β”€ reddit.js           # Reddit scraper
β”‚       β”œβ”€β”€ twitter.js          # Twitter scraper (via Apify)
β”‚       └── github.js           # GitHub GraphQL scraper
β”œβ”€β”€ docs/                       # Renamed from 'public' for GitHub Pages
β”‚   β”œβ”€β”€ index.html              # Main frontend (with access gate)
β”‚   β”œβ”€β”€ config.json             # Default feed configuration
β”‚   └── data/
β”‚       └── feed.json          # Generated by Actions (gitignored)
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ feed.json              # Local development feed cache
β”‚   └── seen.json              # Deduplication tracking
β”œβ”€β”€ .env                        # Local secrets (NEVER commit)
β”œβ”€β”€ .env.example               # Template for required env vars
β”œβ”€β”€ .gitignore                 # Excludes .env, data/*.json, etc.
β”œβ”€β”€ AGENTS.md                  # This file
└── README.md                  # User-facing documentation

Development Workflow

Local Development

  1. Setup:

    cp .env.example .env
    # Edit .env and add your actual API keys
    npm install
  2. Run locally:

    npm start
    # Server runs on http://localhost:3000
    # Access the frontend and test API endpoints
  3. Test scraping:

    node src/scraper-cli.js
    # Runs scrapers once and outputs to public/data/feed.json

Deployment Workflow

  1. Make code changes (scrapers, frontend, etc.)

  2. Test locally first:

    npm start
    node src/scraper-cli.js
  3. Security check before commit:

    git status
    git diff --cached
    grep -r "github_pat\|apify_api\|slack" . --exclude-dir=node_modules
  4. Commit and push:

    git add .
    git commit -m "Description of changes"
    git push origin main
  5. Verify GitHub Action:

    • Go to GitHub repo > Actions tab
    • Check that workflow runs successfully
    • Verify public/data/feed.json is updated
  6. Check deployed site:

    • Visit GitHub Pages URL
    • Enter access token
    • Verify feed loads correctly

Adding/Updating GitHub Secrets

When API keys change or new ones are added:

# Using gh CLI (remember: GitHub doesn't allow "GITHUB" in secret names)
gh secret set GH_PAT              # NOT GITHUB_PAT
gh secret set APIFY_TOKEN
gh secret set SLACK_WEBHOOK_URL
gh secret set TEAM_TWITTER_USERNAMES

Or manually:

  1. Go to GitHub repo Settings > Secrets and variables > Actions
  2. Click "New repository secret"
  3. Add name and value (⚠️ Cannot include "GITHUB" in the name)
  4. Update .env.example to document the new variable

Important: Secret names cannot contain GITHUB. Use alternatives like GH_* instead.

Modifying Feed Sources

Adding a New Twitter Search Term

Option 1: Via Frontend (User-facing)

  1. Click settings icon (βš™οΈ)
  2. Edit JSON configuration
  3. Add to twitter.searchTerms array
  4. Save

Option 2: Via Code (Affects all users)

  1. Edit public/config.json
  2. Add to twitter.searchTerms array
  3. Commit and push

Adding a New Reddit Source

Edit src/scrapers/reddit.js:

const redditUrls = [
  'https://www.reddit.com/search/?q=factoryai&type=link&sort=new',
  'https://www.reddit.com/r/YourNewSubreddit/search/?q=yourterm&restrict_sr=1&sort=new'
];

Adding a New GitHub User/Org to Follow

Edit src/scrapers/github.js:

const usernames = ['anthropics', 'vercel', 'openai', 'yournewuser'];

Creating a New Scraper

  1. Create src/scrapers/newsource.js:

    async function scrapeNewSource() {
      const items = []; // Scrape logic here
      
      return items.map(item => ({
        id: `newsource_${item.uniqueId}`,
        source: 'newsource',
        author: item.author,
        content: item.text,
        url: item.link,
        timestamp: item.date,
        metadata: { /* source-specific data */ }
      }));
    }
    
    module.exports = { scrapeNewSource };
  2. Update src/scraper-cli.js:

    const { scrapeNewSource } = require('./scrapers/newsource');
    
    const results = await Promise.allSettled([
      scrapeReddit(),
      scrapeGitHub(),
      scrapeTwitter(),
      scrapeNewSource() // Add here
    ]);
  3. Update frontend in public/index.html:

    • Add source icon and name to sourceNames and sourceIcons
    • Add filter pill button
    • Add column rendering logic

API Rate Limits & Considerations

Service Rate Limit Notes
GitHub 5,000 req/hour (authenticated) Uses GraphQL; efficient
Reddit ~60 req/min (unauthenticated) Public RSS/JSON feeds
Twitter Via Apify (paid) Check Apify usage dashboard
Slack ~1 req/sec per webhook Only used for manual sends

GitHub Actions limits:

  • 2,000 minutes/month (free tier)
  • Each run ~1-2 minutes
  • Running every 10 min = ~4,300 runs/month (exceeds free tier)
  • Recommendation: Adjust to every 15-30 minutes for free tier

To change frequency, edit .github/workflows/scrape-feeds.yml:

schedule:
  - cron: '*/30 * * * *'  # Every 30 minutes instead of 10

Testing

Manual Testing Checklist

Before pushing changes:

  • Run npm start and verify server starts
  • Visit http://localhost:3000 and verify feed loads
  • Test access token gate (clear localStorage and reload)
  • Test all source filters (Twitter, Reddit, GitHub)
  • Test time filters (1h, 12h, 24h, custom)
  • Test sort toggle (newest/oldest)
  • Test selection (click to select, Cmd+click to open)
  • Test keyboard shortcuts (J/K navigation, A archive, etc.)
  • Test command palette (Cmd+K)
  • Test settings modal (edit configuration)
  • Run node src/scraper-cli.js and verify no errors
  • Check public/data/feed.json is created/updated
  • Verify no secrets in git diff

Automated Tests

Currently no automated tests. To add:

  1. Create tests/ directory
  2. Add Jest or Mocha
  3. Write unit tests for scrapers
  4. Write integration tests for feed aggregation
  5. Add to package.json: "test": "jest"
  6. Run npm test before commits

Common Issues & Solutions

Issue: GitHub Action fails with "Permission denied"

Solution: Ensure GITHUB_TOKEN secret is set with correct permissions.

gh secret set GITHUB_PAT
# Paste your token (must have repo read/write permissions)

Issue: Feed doesn't update on GitHub Pages

Solutions:

  1. Check GitHub Actions tab for errors
  2. Verify public/data/feed.json exists in repo
  3. Clear browser cache (GitHub Pages caches aggressively)
  4. Check Pages settings: Settings > Pages > Build from main branch

Issue: "Access Denied" on frontend

Solutions:

  1. Generate correct token hash: echo -n "password" | shasum -a 256
  2. Update ACCESS_TOKEN_HASH in public/index.html
  3. Or clear localStorage and re-enter token

Issue: Twitter scraping fails

Solutions:

  1. Check Apify token is valid: https://console.apify.com/
  2. Verify token has sufficient credits
  3. Check Apify Actor is still available (they sometimes deprecate)
  4. Consider alternative: Nitter instances or Twitter API v2

Issue: Too many API requests

Solutions:

  1. Reduce scraping frequency in workflow
  2. Add caching layer (check timestamps before fetching)
  3. Reduce number of sources being scraped
  4. Use If-Modified-Since headers where supported

Deployment Checklist

When deploying to a new environment:

  • Create private GitHub repository
  • Add all secrets to repository settings
  • Verify .gitignore excludes .env and data/*.json
  • Push code to main branch
  • Enable GitHub Pages (Settings > Pages > Source: main branch /public)
  • Manually trigger workflow to verify it works
  • Visit GitHub Pages URL and test access
  • Generate and share access token with authorized users
  • Set up monitoring (check Actions tab regularly)

Monitoring & Maintenance

Regular checks (weekly):

  1. Visit GitHub Actions tab, verify recent runs succeeded
  2. Check GitHub Pages site loads correctly
  3. Verify feed data is fresh (timestamps are recent)
  4. Review API usage (GitHub, Apify dashboards)
  5. Check for security alerts (Dependabot)

Monthly maintenance:

  1. Update dependencies: npm update
  2. Review and clean old feed data if growing large
  3. Audit access logs if needed
  4. Rotate access tokens if compromised

Additional Resources