A full-stack demonstration of Node.js Web Streams API for real-time, memory-efficient data streaming from a scraper to a React UI - with zero buffering on the server side.
This project shows how to pipe data end-to-end using the Nodejs Stream, without waiting for all data to be collected before sending it to the client.
Puppeteer (AsyncGenerator)
↓
Readable.from() - Node.js readable stream from async generator
↓
Readable.toWeb() - Convert to WHATWG ReadableStream
↓
TransformStream - Serialize each page's products to NDJSON
↓
Writable.toWeb(res) - Pipe directly into Express HTTP response
↓ (HTTP / fetch)
TextDecoderStream - Decode binary chunks to string (browser)
↓
TransformStream - Parse NDJSON lines back to JS objects
↓
WritableStream - Write parsed products into React state
├── react-client/ # Vite + React frontend
│ └── src/
│ └── App.jsx # Stream consumer UI
│
└── server/ # Express backend
└── index.ts # Puppeteer scraper + stream pipeline
Each scraped page is serialized as one JSON line followed by \n. This lets the client parse results incrementally - one line = one complete chunk of data - without waiting for the full response.
[{"title":"Bag A",...}]\n
[{"title":"Bag B",...}]\n
new TransformStream({
transform(chunk, controller) {
this.buffer += JSON.stringify(chunk) + "\n";
let boundary = this.buffer.indexOf("\n");
while (boundary !== -1) {
const line = this.buffer.substring(0, boundary);
this.buffer = this.buffer.substring(boundary + 1);
if (line.trim()) controller.enqueue(line + "\n");
boundary = this.buffer.indexOf("\n");
}
},
flush(controller) {
if (this.buffer.trim()) controller.enqueue(this.buffer + "\n");
},
})
bufferis stored onthisinside the transformer — it persists across chunks and holds incomplete data between reads.
new TransformStream({
transform(chunk, controller) {
this.buffer += chunk;
const lines = this.buffer.split("\n");
this.buffer = lines.pop() ?? "";
for (const line of lines) {
if (!line.trim()) continue;
try { controller.enqueue(JSON.parse(line)); } catch {}
}
},
})Chunks arriving over HTTP may be split arbitrarily — the buffer ensures we only parse complete lines.
Puppeteer's scraper is an AsyncGenerator. Node's Readable.from() wraps it in a Node.js stream, and Readable.toWeb() converts it to a WHATWG ReadableStream - making it compatible with .pipeThrough() and .pipeTo().
- Node.js 18+
- Google Chrome installed (used by Puppeteer)
Puppeteer browser
npx puppeteer browsers install chromeServer
cd server
pnpm install
pnpm run dev # or: npx ts-node index.tsClient
cd react-client
pnpm install
pnpm run devThen open http://localhost:5173 and click Start Scraping.
- A
GET /scraprequest hits the Express server. - Puppeteer launches a headless Chrome and navigates to the target shop page.
- An
async generator(getProducts) scrapes each page andyields an array of products, then clicks the pagination "next" button and repeats. - The generator is wrapped in a
Readablestream and converted to a WHATWGReadableStream. - A
TransformStreamserializes each yielded array to an NDJSON line and enqueues it. - The final stream is piped directly into the HTTP response - data starts flowing to the client before all pages are scraped.
- In the browser, the response body is decoded, split by newlines, parsed as JSON, and appended to React state in real time.
- The scraper targets
skybuybd.com/shop/purse- adjust the URL and selectors as needed. - Products render live as each page is scraped; no need to wait for all 50 pages.
- The
bufferproperty onTransformStreamis a pattern for stateful transforms - it is not part of theTransformerinterface spec, so TypeScript requires it to be managed outside the object or cast appropriately. - Memory stays low because the generator
yields one page at a time, not all pages at once.
| Layer | Technology |
|---|---|
| Scraping | Puppeteer Extra + Stealth Plugin + Adblocker |
| Server | Node.js, Express, WHATWG Streams (node:stream/web) |
| Client | React (Vite), WHATWG Streams (browser-native) |
| Data format | NDJSON over HTTP streaming |