Skip to content

feat(fetchers): enhance ArXivFetcher with PDF binary indication#89

Merged
chaliy merged 1 commit intomainfrom
fix/issue-57-arxiv-fetcher
Apr 3, 2026
Merged

feat(fetchers): enhance ArXivFetcher with PDF binary indication#89
chaliy merged 1 commit intomainfrom
fix/issue-57-arxiv-fetcher

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented Apr 3, 2026

What

Enhance ArXivFetcher with binary content indication for PDF URLs.

Why

Closes #57 — When agents request /pdf/ URLs, the fetcher should indicate that the original content is binary (PDF) and only metadata is returned, consistent with the core binary handling behavior.

How

  • Added is_pdf_url() helper to detect /pdf/ vs /abs/ URLs
  • Added binary content note in metadata section for PDF URLs
  • Added tests for PDF detection, DOI/journal ref extraction

Risk

  • Low
  • Only adds informational note to output for PDF URLs

Checklist

  • Unit tests are passed
  • Smoke tests are passed
  • Specs are up to date and not in conflict

- Add is_pdf_url() to detect /pdf/ URLs
- Show binary content note for PDF URLs (metadata-only response)
- Add tests for PDF URL detection, DOI/journal extraction
- Verify ar5iv HTML link is included in output

Closes #57
@chaliy chaliy merged commit 9ce3234 into main Apr 3, 2026
11 checks passed
@chaliy chaliy deleted the fix/issue-57-arxiv-fetcher branch April 3, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(fetchers): ArXivFetcher — paper metadata and abstract extraction

1 participant