The CLI data-tracer. data extractor built on top of
curl.
sget operates directly on raw HTTP streams. It treats the web as a queryable data pipeline rather than a visual canvas, making it incredibly lightweight and lightning-fast.
curl-Powered Core: Built straight on top of the industry standard for network requests. Ifcurlcan reach it, sget can extract from it.- Zero Engine Overhead: No Chromium instances, no heavy JS evaluation loops, and zero memory leaks. Just pure data parsing.
- Unix-Pipeline Native: Seamlessly fits into your existing workflow. Pipe HTML/JSON into sget, or pipe sget's structured output straight into
jq,grep, or local files.
[ Target Webpage ]
│
▼ (Optimized HTTP Fetch)
┌───────┐
│ curl │
└───────┘
│
▼ (Raw Data Stream)
┌───────┐
│ sget │ ──► [ Extraction Engine: CSS Selectors / XPath / Regex ]
└───────┘
│
▼ (Structured Output Flush)
[ JSON / CSV / Text ] ──► Pipe to next tool (e.g., jq, redirect to file)
- The Fetch: sget utilizes native network optimization layers via
curlfor highly stable, low-level HTTP requests. - The Stream: The target payload is fed instantly into sget's memory-efficient stream parser without downloading unnecessary visual assets.
- The Extraction: Your declarative filters (CSS tags, XPath nodes, or Regex boundaries) parse the DOM structure instantly.
- The Output: Structured data is flushed to
stdoutin your format of choice, completely ready for automated consumption.
| Layer | Engine/Protocol | Purpose |
|---|---|---|
| Libraries | curl | Low-overhead HTTP transfer, custom headers, proxy handling, and cookie jars. |
| Language | C++ with vcpkg | It is fast. That's it. |
| Formatting | Native Colors(OS) | Use of COLORS defined by the operating system itself for latency |
| Environment | Docker | Containers to run standalone inside any Linux, macOS, or Windows terminal. |
sget is engineered from the ground up for high-performance automation:
- Minimal Memory Footprint: Uses a fraction of the RAM required by headless browsers (Puppeteer/Playwright).
- Parallel Scrapes: Launch multiple data tracing threads concurrently without melting your CPU.
- Bypass Anti-Bot: Native integration for rapid User-Agent rotation, custom request delays, and upstream proxy chains.
Built with 💻 by CoderSilicon
"It is always better to differ from others."
"It is always better to differ from others."