Skip to content

feat(fetchers): enhance RSSFeedFetcher with content-type detection and html_to_markdown#91

Merged
chaliy merged 1 commit intomainfrom
fix/issue-59-rss-feed-fetcher
Apr 3, 2026
Merged

feat(fetchers): enhance RSSFeedFetcher with content-type detection and html_to_markdown#91
chaliy merged 1 commit intomainfrom
fix/issue-59-rss-feed-fetcher

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented Apr 3, 2026

What

Enhance RSSFeedFetcher with content-type-based feed detection and proper HTML-to-markdown conversion for entry descriptions.

Why

Closes #59 — The issue requires detecting feeds via content-type headers (application/rss+xml, application/atom+xml, text/xml) in addition to URL patterns. Also, HTML content in entry descriptions should be properly converted via html_to_markdown rather than simple tag stripping.

How

  • Added is_feed_content_type() for content-type header detection
  • Replaced strip_html with convert_entry_content() that uses html_to_markdown for HTML and passes through plain text
  • Added tests: content-type detection, HTML/plain content handling, CDATA

Risk

  • Low
  • HTML conversion uses existing html_to_markdown function
  • Content-type detection is additive

Checklist

  • Unit tests are passed
  • Smoke tests are passed
  • Specs are up to date and not in conflict

…d html_to_markdown

- Add is_feed_content_type() for detecting feeds by HTTP content-type header
- Use html_to_markdown for HTML content in RSS/Atom entry descriptions
- Replace strip_html with html_to_markdown for richer content conversion
- Add tests: content-type detection, HTML/plain content conversion, CDATA handling

Closes #59
@chaliy chaliy merged commit 9a26f2e into main Apr 3, 2026
11 checks passed
@chaliy chaliy deleted the fix/issue-59-rss-feed-fetcher branch April 3, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(fetchers): RSSFeedFetcher — structured feed parsing

1 participant