Skip to content

Upgrade FireCrawl document loader to the firecrawl JS SDK v4 (v2 API)#6474

Open
rakshith48 wants to merge 6 commits into
FlowiseAI:mainfrom
rakshith48:firecrawl-v2-upgrade
Open

Upgrade FireCrawl document loader to the firecrawl JS SDK v4 (v2 API)#6474
rakshith48 wants to merge 6 commits into
FlowiseAI:mainfrom
rakshith48:firecrawl-v2-upgrade

Conversation

@rakshith48
Copy link
Copy Markdown

Upgrades packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts from the hand-rolled v1 axios calls to the official firecrawl JS SDK v4 (v2 API): .scrape/.crawl/.search/.extract + typed responses. Node label/inputs/modes/output shape preserved; legacy maxDepth mapped to maxDiscoveryDepth for back-compat. Note: HTTP now goes through the SDK client instead of secureAxiosRequest — flagged for reviewer preference.

Verification: static + SDK-introspection/mocked against the real v2 SDK (firecrawl-py 4.28.2 / @mendable firecrawl v4); not run against the live API. Happy to address review/CI feedback.

🤖 Generated with Claude Code

rak-f and others added 4 commits June 4, 2026 13:00
Replace the hand-rolled v1 REST client in the FireCrawl document loader
with the official @mendable/firecrawl-js v2 API (Firecrawl class) and bump
the dependency from ^1.18.2 to ^4.25.2.

- Use `new Firecrawl({ apiKey, apiUrl })` and its `.scrape` / `.crawl` /
  `.search` / `.extract` methods instead of manual axios calls to /v1/*.
- Adapt to v2 response shapes: scrape/crawl return Document(s) directly
  (no { success, data } envelope); crawl returns a CrawlJob with `.data`;
  search returns results grouped by source (use `.web`).
- Preserve the node's inputs, modes, defaults, and Document/Text output
  shape. Search `country` now maps to v2's single `location` field, since
  v1's separate `lang`/`country` params were removed in v2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…crawl-js)

Both names dual-publish the identical v4 SDK; `firecrawl` is the current canonical package.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…crawl-js)

Both names dual-publish the identical v4 SDK; `firecrawl` is the current canonical package.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…2 maxDiscoveryDepth

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the FireCrawl document loader to use the official firecrawl SDK instead of a custom Axios-based implementation, simplifying the codebase and updating the integration to the latest SDK version. Feedback on these changes highlights critical issues where the loader fails to check the success status of SDK operations (scrape, search, crawl, and extract), which could lead to unhandled failures or runtime type mismatches. Additionally, it is recommended to use loose equality checks (== null and != null) as a standard idiom for nullish checks in TypeScript.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts
Comment thread packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts
Comment thread packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts
Comment thread packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts Outdated
Comment thread packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts
Comment thread packages/components/nodes/documentloaders/FireCrawl/FireCrawl.ts
… success/status/error)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rakshith48
Copy link
Copy Markdown
Author

Thanks for the review! I applied one of these and respectfully declined the others — here's the reasoning:

Applied — extract success/error check. extract() returns Promise<ExtractResponse>, which genuinely has success/status/data/error, so checking for failure there is a real improvement. (I used response.success === false || response.status === 'failed' rather than !response.success, since success is optional and a completed job may omit it — !response.success would false-throw.)

Declined — success checks on scrape / search / crawl. These suggestions assume the v1 { success, data, error } envelope, which the v4 SDK removed. In v4 these methods return the data objects directly and throw on error:

  • scrape(): Promise<Document> — no .success/.data (the doc fields are top-level: .markdown, .metadata)
  • search(): Promise<SearchData> — results are under .web/.news/.images, no .success
  • crawl(): Promise<CrawlJob> — has .status/.data, no .success

So response.success/response.data don't exist on those types — the suggested patches would fail the TypeScript build. The current code reads the correct fields (verified via tsc against the real @mendable/firecrawl-js v4 types), and failures propagate as thrown errors.

Declined — == null style nits. Keeping the existing strict-equality checks for consistency; purely cosmetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants