refactor: replace pipeline with layered app runtime#1
Conversation
Document the approved ports-and-use-cases end state so the codebase can refactor toward a stable, testable architecture.
Move API and CLI onto a shared composition root so runtime, media, RSS, and SQLite concerns flow through explicit use cases and adapters instead of a monolithic pipeline.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c73cce470
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await self.process_source.execute( | ||
| cast(str, entry.source_url), entry.title | ||
| ) | ||
| processed_count += 1 |
There was a problem hiding this comment.
Count processed items only after successful processing
This increments processed_count unconditionally whenever process=True, but ProcessSource.execute returns a Failure object on exceptions instead of raising, so failed jobs are still counted as processed. In practice, /ingest can report successful processing for entries that never produced a summary, which corrupts ingest metrics and can cause downstream automation to treat failed runs as healthy.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in ee3b02a. IngestFeed.execute() now increments processed_count only when process_source.execute(...) does not return Failure, and tests/application/test_ingest_feed.py now covers the failure path.
| source_url=entry.source_url, | ||
| published_at=None, | ||
| ) |
There was a problem hiding this comment.
Preserve feed publication timestamps when saving feed items
Feed items are always persisted with published_at=None, which drops publication timestamps during ingest. Because RSS rendering only emits <pubDate> when published_at is set, ingested videos lose their item dates in generated feeds, degrading freshness/ordering signals for RSS clients that rely on publication time.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in ee3b02a. FetchedFeedEntry now carries published_at, HttpFeedSource parses item publication dates, and SQLiteFeedVideoRepository persists them so downstream RSS publication reads retain pubDate. Added regression coverage in the feed-source and SQLite repository tests.
Keep ingest metrics and publication metadata accurate across review feedback, and make the affected regression tests platform-safe so CI reflects real behavior.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee3b02aa00
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| published_parsed = entry.get("published_parsed") | ||
| if published_parsed is None: |
There was a problem hiding this comment.
Preserve updated timestamps when published is absent
_pick_published_at only checks published/published_parsed, so feeds that expose timestamps via updated/updated_parsed (common in Atom) now lose item dates during ingest. That causes generated RSS items to omit <pubDate> for those sources, which can break chronological ordering and freshness signals in RSS readers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 689c760. feed_source._pick_published_at(...) now falls back to updated / updated_parsed when published fields are absent, and tests/infrastructure/test_feed_source.py now covers the Atom updated path.
| if published_parsed is None: | ||
| return None | ||
|
|
||
| return datetime(*published_parsed[:6], tzinfo=timezone.utc) |
There was a problem hiding this comment.
Guard published_parsed conversion against malformed dates
The direct call to datetime(*published_parsed[:6], ...) can raise TypeError/ValueError for malformed or partial date tuples, and that exception currently bubbles out of feed ingestion. In practice, a single bad timestamp in one entry can fail the whole /ingest request instead of just dropping that entry’s publication date.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 689c760. Parsed tuple conversion is now guarded in _parse_datetime_tuple(...), so malformed published_parsed / updated_parsed values are ignored instead of failing the whole ingest. Added regression coverage in tests/infrastructure/test_feed_source.py.
Preserve item dates from updated timestamps and ignore malformed parsed tuples so feed ingestion keeps working across mixed RSS and Atom metadata.
Summary
service_pipeline.py, and update docs/package metadata to match the new structureTest Plan
pytestruff check .