Skip to content

feat: provide Request instances in skipped request callbacks#1999

Open
vdusek wants to merge 4 commits into
masterfrom
provide-request-in-skipped-callbacks
Open

feat: provide Request instances in skipped request callbacks#1999
vdusek wants to merge 4 commits into
masterfrom
provide-request-in-skipped-callbacks

Conversation

@vdusek

@vdusek vdusek commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

on_skipped_request callbacks can now receive the full Request object instead of only the URL string, so handlers can access request metadata such as user_data.

This stays backward compatible by dual-dispatching on the type annotation of the callback's first parameter: annotate it as Request to receive the object, while a str annotation (or no annotation) keeps receiving the URL string as before. Skipped URLs are normalized to Request once at the choke points (add_requests and both extract_links implementations), the duplicated request-building helper was extracted to crawlee._utils.requests.create_request_from_options, and link extraction now applies transform_request_function to robots-skipped requests for consistency with enqueued ones.

Reworked from #1927 (original author inactive).

@vdusek vdusek added t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels Jun 30, 2026
@vdusek vdusek self-assigned this Jun 30, 2026
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.77108% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.17%. Comparing base (a58687e) to head (608fbb6).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/crawlee/crawlers/_basic/_basic_crawler.py 83.33% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1999      +/-   ##
==========================================
+ Coverage   93.09%   93.17%   +0.08%     
==========================================
  Files         167      167              
  Lines       11775    11815      +40     
==========================================
+ Hits        10962    11009      +47     
+ Misses        813      806       -7     
Flag Coverage Δ
unit 93.17% <92.77%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vdusek vdusek requested a review from Pijukatel June 30, 2026 12:22
@vdusek vdusek marked this pull request as ready for review June 30, 2026 12:22
@vdusek vdusek requested a review from Mantisus July 1, 2026 07:22
"""


def _skipped_request_callback_expects_request(callback: Callable[..., Awaitable[None]]) -> bool:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can wait for v2 release and make a breaking change with a clear signature to avoid this kind of fragile runtime inspection.


# A string annotation may not resolve to the class (e.g. a `TYPE_CHECKING`-only import), so match by name.
if isinstance(annotation, str):
return _annotation_names_request(annotation)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, this won't match when the annotation uses an import alias under TYPE_CHECKING:

if TYPE_CHECKING:
    from crawlee import Request as CrawleeRequest

async def skipped_hook(request: CrawleeRequest, _reason: SkippedReason) -> None:
    pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants