Skip to content

feat(firecrawl): migrate FireCrawl loader to Firecrawl v2 (v4) SDK#1

Closed
rakshith48 wants to merge 4 commits into
mainfrom
firecrawl-v2-upgrade
Closed

feat(firecrawl): migrate FireCrawl loader to Firecrawl v2 (v4) SDK#1
rakshith48 wants to merge 4 commits into
mainfrom
firecrawl-v2-upgrade

Conversation

@rakshith48
Copy link
Copy Markdown
Owner

Summary

Upgrades the FireCrawl document loader (packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts) from the legacy Firecrawl v1 API to the official @mendable/firecrawl-js v2 (v4 major) SDK, and bumps the dependency in packages/components/package.json from ^1.18.2 to ^4.25.2.

The previous node declared @mendable/firecrawl-js as a dependency but never imported it — it hand-rolled its own FirecrawlApp class hitting the /v1/scrape, /v1/crawl, /v1/extract, /v1/search REST endpoints directly. This PR replaces that hand-rolled client with the official SDK.

v1 → v2 changes

  • Client: new FirecrawlApp({ apiKey, apiUrl }) (custom) → new Firecrawl({ apiKey, apiUrl }) (SDK default export).
  • Methods: manual axios POST /v1/* + status polling → SDK app.scrape(), app.crawl() (built-in waiter, pollInterval: 2), app.search(), app.extract().
  • Response shapes (v2 removed the { success, data } envelope):
    • scrape → returns a Document directly (markdown/html/metadata at top level).
    • crawl → returns a CrawlJob with .status and .data: Document[].
    • search → returns SearchData grouped by source; the loader reads .web and normalizes each entry (lightweight SearchResultWeb or full Document).
    • extract → returns ExtractResponse with .data / .status / .expiresAt.
  • Search params: v2 dropped v1's separate lang/country. To avoid breaking existing node configs, the node's Country input is mapped to v2's single location field; lang is no longer sent (input remains in the UI but is inert for v2). limit, timeout, tbs, ignoreInvalidURLs are passed through.
  • Behavior preserved: same node label/name/version (4.0), same inputs, same modes (crawl / scrape / extract / search), same defaults (formats: ['markdown'], onlyMainContent: true), and the same LangChain Document[] / concatenated-Text output. integration: 'flowise' is still sent on every call.

Note for reviewers (security tradeoff)

The old hand-rolled client routed requests through Flowise's secureAxiosRequest (SSRF-protected HTTP wrapper). The official SDK uses its own internal HTTP client (undici/fetch), so this migration moves Firecrawl traffic off secureAxiosRequest. This matches how most other vendor-SDK nodes in packages/components operate, but flagging it explicitly since it is a behavioral change worth a conscious decision.

Verification

What I did verify:

  • Type-checked the edited file against the real published SDK types. Installed @mendable/firecrawl-js@4.25.2 in an isolated project and ran tsc --strict --noEmit on FireCrawl.ts (with only the Flowise-internal ../../../src/* and @langchain/* imports stubbed). Result: 0 type errors — every v2 method name, option field, and response field access used in the file resolves against the actual v4 .d.ts.
  • Confirmed the SDK's public exports: Firecrawl (default export) and the named types Document, ScrapeOptions, CrawlOptions, SearchRequest, SearchResultWeb are all exported by @mendable/firecrawl-js@4.25.2.
  • Cross-checked method signatures against the SDK's own source (src/v2/client.ts, src/v2/methods/*.ts) and the published npm README usage examples — no guessed method names.
  • Prettier: ran prettier --check with Flowise's exact config (printWidth 140, tabWidth 4, singleQuote, no semicolons, trailingComma none) — passes.

What I did NOT run (and why):

  • No full Flowise monorepo build / pnpm install. The full install + turbo build is heavy; per task constraints it was avoided. The single-file strict type-check above substitutes for it but does not exercise the real @langchain/* / Flowise Interface/utils types (those were stubbed).
  • No runtime / end-to-end test against a live Firecrawl API key — the node was not executed against real endpoints.

🤖 Generated with Claude Code

rak-f and others added 3 commits June 4, 2026 13:00
Replace the hand-rolled v1 REST client in the FireCrawl document loader
with the official @mendable/firecrawl-js v2 API (Firecrawl class) and bump
the dependency from ^1.18.2 to ^4.25.2.

- Use `new Firecrawl({ apiKey, apiUrl })` and its `.scrape` / `.crawl` /
  `.search` / `.extract` methods instead of manual axios calls to /v1/*.
- Adapt to v2 response shapes: scrape/crawl return Document(s) directly
  (no { success, data } envelope); crawl returns a CrawlJob with `.data`;
  search returns results grouped by source (use `.web`).
- Preserve the node's inputs, modes, defaults, and Document/Text output
  shape. Search `country` now maps to v2's single `location` field, since
  v1's separate `lang`/`country` params were removed in v2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…crawl-js)

Both names dual-publish the identical v4 SDK; `firecrawl` is the current canonical package.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…crawl-js)

Both names dual-publish the identical v4 SDK; `firecrawl` is the current canonical package.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rakshith48
Copy link
Copy Markdown
Owner Author

Updated to use the canonical firecrawl npm package instead of the legacy @mendable/firecrawl-js (both dual-publish the identical v4 SDK at the same versions; firecrawl is the current canonical name). Drop-in: same default export + types.

Note: the dep was swapped in place in package.json, so it may need npx sort-package-json to land in the correct alphabetical position (firecrawl sorts under f, not where the @mendable scoped entry was).

@rakshith48 rakshith48 marked this pull request as ready for review June 4, 2026 07:57
…2 maxDiscoveryDepth

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rakshith48
Copy link
Copy Markdown
Owner Author

Superseded — migrated to upstream: FlowiseAI#6474

@rakshith48 rakshith48 closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants