Skip to content

brightdata/social-listening-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bright Data     ×     NVIDIA



Eye Data

A social intelligence terminal powered by the world's best web data infrastructure.


Turn any topic into a live map of what the internet thinks — in under a minute.


Live demo · How it works · Architecture · Deploy your own



What is this?

Eye Data is a reference application showing what's possible when you combine Bright Data's web data infrastructure with NVIDIA's NeMo Agent Toolkit and a modern LLM pipeline.

Give it a topic — a company, a product, a brand, a trend — and in ~30 seconds it will:

  1. Discover conversations happening right now on Reddit, X, and LinkedIn (plus the open web)
  2. Analyze sentiment, themes, risks, and opportunities across ~90 real posts
  3. Synthesize them into a handful of narratives that tell you what's actually being said
  4. Visualize the evidence as a searchable graph + timeline so you can drill into every claim

It's a social-listening tool, a brand-intelligence tool, a competitive-intelligence tool, a market-research tool — depending on the topic you give it. The underlying engine doesn't care.


Why Bright Data is the backbone

Social data is only as good as your ability to collect it at scale, reliably, and without getting blocked.

Bright Data is the web data infrastructure this kind of agent needs to exist:

  • SERP API — programmatic Google/Bing/Yandex search, with filters that would take you weeks to replicate
  • Web Unlocker — scrape the sites that block everyone else (Reddit, X, LinkedIn, e-commerce platforms) without a single proxy or CAPTCHA conversation
  • Dataset marketplace — 150+ pre-built datasets for LinkedIn, Amazon, Instagram, TikTok, and more
  • MCP + Agent-native APIs — built from day one for AI agents, not hacked onto a scraper

The web is the largest training-and-grounding corpus in the world. Bright Data is how agents reach it.

Everything that happens below the "the LLM decides what to search for" step in this app runs on Bright Data:

┌─────────────────────────────────────┐
│  LLM generates 6 search keywords    │
└────────────────┬────────────────────┘
                 ▼
┌─────────────────────────────────────┐
│  Bright Data SERP API               │
│  → Reddit + X + LinkedIn + Web      │
│  → ~315 results in <15 seconds      │
└────────────────┬────────────────────┘
                 ▼
┌─────────────────────────────────────┐
│  Bright Data Web Unlocker           │
│  → Post-level content extraction    │
│  → Markdown-ready, no CAPTCHAs      │
└────────────────┬────────────────────┘
                 ▼
┌─────────────────────────────────────┐
│  Ranked down to 90 posts            │
│  Ready for the LLM to analyze       │
└─────────────────────────────────────┘

Why NVIDIA NeMo Agent Toolkit

The orchestration layer is NVIDIA's NeMo Agent Toolkit (NAT) — a production-grade runtime for LangGraph agents that gives us four things for free:

  • OpenAI-compatible API surface (/v1/chat/completions with stream: true) — any OpenAI client works against it
  • First-class WebSocket support for real-time streaming of intermediate steps
  • Automatic OpenTelemetry instrumentation for every LLM call (Langfuse, Langsmith, Phoenix, Grafana Tempo — all one config line away)
  • Plugin system for workflows — we register a langgraph_wrapper workflow that wraps the 8-stage pipeline as a single agent the toolkit can serve

The LLM itself is pluggable: OpenRouter, NVIDIA NIM, local inference — anything speaking the OpenAI API works.


How it works

The 8-stage pipeline (click to expand)

Every run executes these stages as a LangGraph state machine, streaming progress events to the browser as it goes.

# Stage What it does Data in / out
1 plan LLM generates 6 targeted search keywords from the topic. Falls back to heuristics if the LLM stumbles. topic → keywords[]
2 search Bright Data SERP API runs those keywords against Reddit, X, LinkedIn, and open-web discovery. keywords[] → ~315 raw SERP results
3 collect Deduplicates by URL; builds normalized CollectedPost objects. SERP results → posts[]
4 rank Heuristic scoring (topical relevance, recency, engagement, text richness) picks the best ~90 via round-robin across platforms. posts[] → ranked_posts[]
5 analyze LLM classifies each post in batches of 16 (up to 25 concurrent calls): sentiment, key takeaway, risk/opportunity tags, narrative candidates. posts[] → analyses[]
6 synthesize LLM clusters the analyses into 3–10 narratives with title/summary/momentum/sentiment lean. Produces an executive brief (risks, opportunities, recommended actions). analyses[] → narratives[] + brief
7 render Builds three visual artifacts: evidence graph (nodes + edges), timeline tape, sentiment clusters. narratives[] → artifacts
8 persist Parallel writes to Neon Postgres (posts, analyses, narratives, brief, session meta, full event log). everything → DB

Each stage emits CUSTOM_START / CUSTOM_END OpenTelemetry events with structured progress. The frontend consumes these via SSE and renders the live terminal view.

What ends up in the database

Seven tables in Neon Postgres capture the complete run — the session itself, every event it emitted, every post collected, the LLM's analysis of each post, the synthesized narratives, and the executive brief. This means any session is fully replayable after the fact; no need to re-scrape the web.

Table Contents
sessions Session id, topic, status, progress, artifacts
session_events Structured event log (every stage transition)
session_posts Every collected post + scoring signals
session_post_analysis LLM output per post (sentiment, tags, takeaway)
session_narratives The clustered narratives for the session
session_briefs Risks / opportunities / recommended actions

Architecture

        ┌────────────────────────────────────────────────────────────────┐
        │                    demos.brightdata.com/eye-data                │
        │                           (Vercel)                              │
        │                                                                 │
        │  ┌──────────────────────┐          ┌───────────────────────┐    │
        │  │ Next.js 16 App       │          │ Server-only /api/run  │    │
        │  │ React 19 · Tailwind  │◄────────►│ (proxies backend,     │    │
        │  │ Framer Motion        │   SSE    │  hides URL from       │    │
        │  │ react-force-graph-2d │          │  the browser)         │    │
        │  └──────────────────────┘          └───────────┬───────────┘    │
        └──────────────────────────────────────────────┬─┴────────────────┘
                                                       │
                                       HTTPS · stream=true
                                                       │
                                                       ▼
        ┌────────────────────────────────────────────────────────────────┐
        │               AWS Lightsail Container Service                  │
        │                    (us-east-1 · nano tier)                     │
        │                                                                │
        │   ┌────────────────────────────────────────────────────────┐   │
        │   │  NVIDIA NeMo Agent Toolkit                              │   │
        │   │  ├── FastAPI · /v1/chat/completions · /websocket       │   │
        │   │  ├── OpenTelemetry instrumentation (opt-in exporters)   │   │
        │   │  └── LangGraph: 8-stage pipeline (plan→…→persist)       │   │
        │   └────────────────────────────────────────────────────────┘   │
        └──────────┬──────────────────────┬──────────────────────┬───────┘
                   │                      │                      │
                   ▼                      ▼                      ▼
          ┌────────────────┐   ┌──────────────────┐   ┌──────────────────┐
          │   Bright Data  │   │ OpenRouter / NIM │   │  Neon Postgres   │
          │                │   │                  │   │                  │
          │  SERP API      │   │  LLM inference   │   │  Sessions, posts,│
          │  Web Unlocker  │   │  (pluggable)     │   │  narratives,     │
          │                │   │                  │   │  briefs, events  │
          └────────────────┘   └──────────────────┘   └──────────────────┘

Stack at a glance

Frontend

  • Next.js 16 (App Router)
  • React 19, TypeScript
  • Tailwind CSS v4
  • Framer Motion
  • Lucide icons
  • react-force-graph-2d
  • Drizzle ORM + Neon serverless

Backend

  • Python 3.11
  • NVIDIA NeMo Agent Toolkit
  • LangGraph (8-node state machine)
  • LangChain (OpenAI / NVIDIA)
  • NeMo Guardrails
  • FastAPI + SSE
  • SQLAlchemy (async) + asyncpg

Infrastructure

  • Bright Data SERP + Web Unlocker
  • NVIDIA NeMo Agent Toolkit
  • Vercel for the Next.js frontend
  • AWS Lightsail containers for NAT
  • GHCR for container images
  • Neon Postgres for persistence
  • Pluggable: Langfuse, Phoenix, Langsmith for observability

Repository layout

.
├── .github/workflows/vercel.yml   # CI: deploy frontend to Vercel on push
├── eye-data/
│   ├── eye-data-app/              # Next.js 16 frontend (Vercel)
│   │   ├── src/app/               # Pages, API routes
│   │   ├── src/components/        # Terminal, dashboard, evidence graph
│   │   └── src/lib/               # run-transport, nat-protocol, env
│   │
│   └── pipeline/                  # Python backend (AWS Lightsail)
│       ├── pipeline/
│       │   ├── graph/             # LangGraph pipeline + per-stage nodes
│       │   ├── brightdata/        # SERP + Web Unlocker clients
│       │   ├── llm/               # OpenAI-compatible LLM client
│       │   ├── db/                # SQLAlchemy models + repos
│       │   └── prompts/           # Versioned prompt templates
│       ├── nat_config.yaml        # NeMo Agent Toolkit workflow config
│       ├── Dockerfile             # Multi-stage build for Lightsail
│       ├── deploy.sh              # Manual deploy to Lightsail via GHCR
│       └── requirements.lock.txt  # Pinned Python deps (no pip backtracking)

Self-host

Run the frontend locally

cd eye-data/eye-data-app
cp .env.local.example .env.local
# edit .env.local (see the file for every variable)
npm install
npm run dev

The app serves under /eye-data (configurable via NEXT_PUBLIC_BASE_PATH). Visit http://localhost:3000/eye-data.

Run the backend locally

cd eye-data/pipeline
pip install -r requirements.lock.txt
pip install -e . --no-deps
nat serve --config_file nat_config.yaml --port 8001

Or in a container that matches production:

docker build -t eye-data-api .
docker run --rm -p 8080:8080 --env-file ../.env eye-data-api

Deploy

Full deployment uses:

The deploy script builds a linux/amd64 image, pushes it to GitHub Container Registry, then tells Lightsail to pull and run. Total first deploy: ~20 minutes. Subsequent deploys (cached layers): ~3 minutes.

See the comment header at the top of deploy.sh for the one-time prerequisites (AWS SSO login, docker login ghcr.io, QEMU binfmt registration for cross-builds).


About the companies behind this

The world's leading web data infrastructure.

Fortune 500 companies, top universities, and every serious AI lab use Bright Data to turn the public web into structured data — compliantly, at massive scale, from any geography.

For AI specifically, Bright Data provides:

  • Real-time web access for agents (MCP + native APIs)
  • SERP results from every major search engine
  • Web Unlocker that bypasses anti-bot systems on the hardest sites
  • 150+ prebuilt datasets for major platforms
  • Full compliance posture (GDPR, CCPA, SOC2, ISO)

If your AI needs to know what the world is doing right now, Bright Data is how it finds out.

Production runtime for LLM agents.

NAT wraps LangGraph-style agents in a hardened serving layer with OpenAI-compatible APIs, WebSocket streaming, OpenTelemetry instrumentation, and a plugin system for tools, guardrails, and evaluators.

For this app specifically, NAT gives us:

  • /v1/chat/completions with streaming — every OpenAI client works
  • Real-time intermediate-step events (the stage-by-stage progress you see in the UI)
  • Opt-in tracing to Langfuse, Phoenix, Langsmith, and more
  • A LangGraph wrapper that deploys a multi-node pipeline as a single agent

License

This is a reference implementation. See LICENSE for terms. The Bright Data and NVIDIA names and logos are trademarks of their respective owners


About

Live social intelligence for any topic - combining Bright Data SERP + Web Unlocker, NVIDIA NeMo Agent Toolkit, LangGraph, Next.js, and Neon Postgres.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors