Skip to content

Commit 2e5103c

Browse files
committed
Introduce test and jinja
1 parent 22e6cde commit 2e5103c

70 files changed

Lines changed: 4893 additions & 756 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
This is a mise project, and the python virtualenv will always be active here.
2+
Always read README.md for usage instructions.
3+
Always run `mise validate` after making changes to the codebase.
4+
If you are making a large batch of changes, run validation at the end.

README.md

Lines changed: 54 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,27 @@
11
# RSS Glue
22

3-
RSS Glue is a highly extensible, filesystem-based RSS/Atom feed generator and manipulator. Build digests, merge feeds, and use AI tools to make your RSS feed work for you!
3+
RSS Glue is a highly extensible, filesystem-based RSS/Atom feed generator and manipulator.
4+
5+
* Compose, merge, digest, and filter feeds.
6+
* Take action on feed content (limited, this isnt an automation platform).
7+
* Make your own feed pipelines with a little bit of Python.
48

59
<img src='./docs/images/glue.webp' width=300 style='border-radius: 10px' />
610

7-
## Inspiration
11+
RSS Glue functions as either a static generator and an on-demand feed server.
812

9-
* [Kill the newsletter](https://kill-the-newsletter.com/)
10-
* [Rsshub](https://docs.rsshub.app/)
11-
* [Zapier](https://zapier.com/)
13+
* `rss-glue watch` - **Static generator mode.** Watches your configuration and regenerates feeds on a schedule.
14+
* `rss-glue debug` - **On-demand server mode.** Serves feeds and generates them on-demand when requested.
1215

13-
## Features
16+
## Inputs and Outputs
1417

1518
External Data Sources
1619

17-
* `RssFeed` is a data source using an external RSS Feed.
20+
* `RssFeed` an external RSS Feed.
1821
* `RedditFeed` because the standard Reddit RSS feed leaves out too much content and you can get more from the JSON api.
22+
* `HackerNewsFeed` Hacker News stories (top, new, or best).
23+
* `InstagramFeed` public Instagram profiles using ScrapeCreator.
24+
* `FacebookFeed` public Facebook groups and pages using ScrapeCreator.
1925

2026
Meta Data Sources
2127

@@ -26,6 +32,8 @@ Meta Data Sources
2632

2733
Outputs
2834

35+
Your feeds generate the following outputs. Each source feed in the configuration file will generate a fixed set of output artifacts.
36+
2937
* `RssOutput` is an output RSS feed.
3038
* `HTMLOutput` is a very basic single page web feed output.
3139
* `HTMLIndexOutput` is a meta-output HTML page with a link to all its child outputs. Handy for quick reference and adding feeds to your RSS reader.
@@ -42,12 +50,22 @@ pip install rss-glue
4250
# Create your configuration file and edit it
4351
touch config.py
4452

45-
# Then generate your feed
53+
# Populate your feed
4654
rss-glue --config config.py update
47-
# Or start a long-running process
55+
56+
# Start a watcher process to regenerate files on schedule
4857
rss-glue --config config.py watch
49-
# Or start the debug web server
58+
59+
# Start a debug web server
5060
rss-glue --config config.py debug
61+
62+
# Force regeneration of all files
63+
rss-glue --config config.py generate --force
64+
65+
# Repair the timestamps on the disk database
66+
# Rss glue depends on manipulating file modification times for caching.
67+
# Editing files outside of rss-glue can break this.
68+
rss-glue --config config.py repair
5169
```
5270

5371
### Docker
@@ -96,7 +114,7 @@ global_config.configure(
96114
# All times and schedules are in UTC
97115
cron_weekly_on_sunday = "0 5 * * 0"
98116

99-
_outputs = [
117+
sources = [
100118
# A weekly digest of the F1 subreddit
101119
DigestFeed(
102120
RssFeed(
@@ -115,28 +133,40 @@ _outputs = [
115133
)
116134
),
117135
]
136+
```
118137

119-
# Finally, declare your artifacts
120-
artifacts = [
121-
HTMLIndexOutput(
122-
OpmlOutput(RssOutput(*_outputs)),
123-
HTMLOutput(*_outputs),
124-
)
125-
]
138+
### Building your own feeds
139+
140+
This project is intended to be quite extensible. To build a new data source or feed transformer, just fork this project and write a new feed class.
141+
142+
```python
143+
from rss_glue.feeds.base_feed import BaseFeed
144+
145+
class MyCustomFeed(BaseFeed):
146+
def __init__(self, namespace: str, some_parameter: str):
147+
super().__init__(namespace)
148+
self.some_parameter = some_parameter
149+
150+
def update(self):
151+
for post in []:
152+
post_id = 'id'
153+
self.cache_set(post_id, value.to_dict())
154+
self.set_last_run()
126155
```
127156

128157
## Design philosophy
129158

130159
RSS Glue is a simple python tool to manage feed generation and outputs exclusively to the local filesystem. The files it generates could be exposed with a web server, pushed up to an S3 bucket, or built into a netlify deployment. You don't need to operate a server or have some kind of docker host to run it.
131160

132-
It is not intended to scale beyond a few hundred feeds because, well, you're a human and you can't read all that anyway! It isn't intended to deploy as a multi-user app on the web. It will not get a frontend or a configuation language.
161+
It is not intended to scale beyond a few hundred feeds because, well, you're a human and you can't read all that anyway! It isn't intended to deploy as a multi-user app on the web. It will get a SPA app or a configuration language.
133162

134-
**Compared with RSSHub**
163+
This project is a little like a tiny, feed-oriented version of [dagster](https://dagster.io/). Your feed definition will be a list of DAGs. Each root feed will generate a predefined set of outputs.
135164

136-
RSSHub is cool, but has several problems that RSS Glue tries to solve. The integrations I care about are either broken or don't work very well. Features like merge and digest are impossible under its stateless architecture.
165+
**Compared with RSSHub**
137166

138-
You can use RSS Glue and RSSHub together though!
167+
RSSHub does not work very well. If it's meeting your needs, you probably don't need RSS Glue.
139168

140-
**Compared with Zapier**
169+
**Event-driven vs Polling**
141170

142-
Zapier suffers from trying to be all things to all people, and the configuration hell that such an endeavor always leads to. It's kinda good at a lot of stuff.
171+
RSS Glue is a polling-based system. It wakes up on a schedule, fetches feeds, generates outputs, and goes back to sleep.
172+
This is because some feeds can be expensive to fetch and process, and it's better to do that work outside the request thread of the downstream RSS consumer. It also makes it easier to trace problems because artifacts are statically generated.

docker-compose.yml

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
11
version: '3.8'
22
services:
33

4+
# MongoDB service for caching
5+
mongo:
6+
image: mongo:7
7+
container_name: rssglue_mongo
8+
restart: unless-stopped
9+
environment:
10+
MONGO_INITDB_ROOT_USERNAME: rssglue
11+
MONGO_INITDB_ROOT_PASSWORD: changeme
12+
volumes:
13+
- mongo_data:/data/db
14+
ports:
15+
- "27017:27017"
16+
417
# rssglue service generates the static files into the rssglue_static volume
518
rssglue:
619
image: rssglue:latest
@@ -9,10 +22,14 @@ services:
922
dockerfile: Dockerfile
1023
container_name: rssglue
1124
restart: unless-stopped
25+
environment:
26+
MONGO_CONNECTION_STRING: mongodb://rssglue:changeme@mongo:27017/
1227
volumes:
1328
- rssglue_static:/opt/rssglue/src/static
1429
- ./samples/docker-config.py:/opt/rssglue/config.py
1530
command: --config /opt/rssglue/config.py watch
31+
depends_on:
32+
- mongo
1633

1734
# Nginx service serves the static files from the rssglue_static volume
1835
nginx:
@@ -25,4 +42,5 @@ services:
2542
- "5000:80"
2643

2744
volumes:
28-
rssglue_static:
45+
rssglue_static:
46+
mongo_data:

docs/CacheErrorHandling.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Cache Error Handling Implementation
2+
3+
## Overview
4+
5+
The cache feed now implements robust error handling for failed media downloads. When a download fails, the system marks it to prevent repeated attempts.
6+
7+
## Implementation Details
8+
9+
### Failed Download Marking
10+
11+
When a media download fails (image or video):
12+
13+
1. **Metadata File Creation**: A `.failed` file is created with the following structure:
14+
```json
15+
{
16+
"timestamp": 1234567890.0,
17+
"error": "HTTP 404: Not Found",
18+
"url_hash": "abc123def456"
19+
}
20+
```
21+
22+
No empty placeholder file is created - only the `.failed` metadata file is needed.
23+
24+
### Detection Mechanism
25+
26+
The system checks if a download previously failed by simply looking for the existence of a `.failed` file with the expected name pattern.
27+
28+
### Example
29+
30+
For a failed image download:
31+
- Metadata file: `static/images/my_feed/a1b2c3d4e5f6.jpg.failed` (contains error details)
32+
- No cache file is created
33+
34+
For a successful download:
35+
- Cache file: `static/images/my_feed/a1b2c3d4e5f6.jpg` (contains actual image data)
36+
- No `.failed` file exists
37+
38+
## Benefits
39+
40+
1. **No Repeated Failures**: Once a download fails, it won't be attempted again
41+
2. **Debugging Information**: The `.failed` file contains error details for troubleshooting
42+
3. **No External Dependencies**: Uses only standard library features
43+
4. **Cross-Platform**: Works on macOS, Linux, and Windows
44+
5. **Minimal Overhead**: Only one small JSON metadata file per failure (no empty placeholder files)
45+
6. **Efficient**: Only one file check needed (no need to check both file size and metadata)
46+
47+
## Cleanup
48+
49+
To retry failed downloads, simply delete the `.failed` file:
50+
51+
```bash
52+
# Find all failed downloads
53+
find static/images -name "*.failed"
54+
find static/videos -name "*.failed"
55+
56+
# Remove a specific failed download marker to retry
57+
rm static/images/my_feed/abc123.jpg.failed
58+
59+
# Remove all failed markers to retry everything
60+
find static -name "*.failed" -delete
61+
```

docs/FeedLocking.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Feed Locking Feature
2+
3+
## Overview
4+
5+
The feed locking feature allows you to prevent feeds from being updated automatically. This is useful when a feed requires manual intervention or when you want to temporarily disable updates for a specific feed.
6+
7+
## How It Works
8+
9+
When a feed is locked:
10+
- A `locked: true` flag is added to the feed's metadata file
11+
- Updates are skipped during normal `update` and `watch` operations
12+
- A warning is logged when an update is attempted on a locked feed
13+
- The lock can be overridden by using `force=True` (via the `--force` flag)
14+
15+
## Usage
16+
17+
### Locking a Feed
18+
19+
```bash
20+
rss-glue --config config.py lock <namespace>
21+
```
22+
23+
Example:
24+
```bash
25+
rss-glue --config config.py lock reddit_minneapolis
26+
```
27+
28+
### Unlocking a Feed
29+
30+
```bash
31+
rss-glue --config config.py unlock <namespace>
32+
```
33+
34+
Example:
35+
```bash
36+
rss-glue --config config.py unlock reddit_minneapolis
37+
```
38+
39+
### Viewing Lock Status
40+
41+
The lock status is displayed when listing sources:
42+
43+
```bash
44+
rss-glue --config config.py sources
45+
```
46+
47+
Output will show 🔒 for locked feeds and 🔓 for unlocked feeds:
48+
```
49+
🔒 2025-11-12 10:30:00+00:00 -- Reddit Minneapolis -- RedditFeed#reddit_minneapolis
50+
🔓 2025-11-12 11:45:00+00:00 -- Hacker News -- HackerNewsFeed#hackernews_best
51+
```
52+
53+
### Force Updating a Locked Feed
54+
55+
You can still force an update on a locked feed by using the `--force` flag:
56+
57+
```bash
58+
rss-glue --config config.py update --force --feed <namespace>
59+
```
60+
61+
Example:
62+
```bash
63+
rss-glue --config config.py update --force --feed reddit_minneapolis
64+
```
65+
66+
## When to Use Feed Locking
67+
68+
1. **Manual Intervention Required**: When a feed needs debugging or manual fixes
69+
2. **Temporary Disable**: When you want to temporarily stop updates without removing the feed from your configuration
70+
3. **Rate Limiting**: When a feed source is having issues and you need to prevent repeated failed update attempts
71+
4. **Development**: When testing other feeds and you want to skip updating certain expensive feeds
72+
73+
## Implementation Details
74+
75+
The lock is implemented at the base feed level, which means:
76+
- All feed types support locking (RSS, Reddit, Instagram, Merge, Digest, etc.)
77+
- The lock check happens early in the update process
78+
- The lock state is persisted in the feed's metadata cache
79+
- Locked feeds can still be read and generate outputs normally
80+
- Only the update/refresh operation is blocked

docs/Roadmap.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
## Problem: Image download failures cause infinite retries
2+
3+
Images that fail to download will be retried every generate cycle. It's not a huge problem, but it would be better to limit the number of retries per image.
4+
5+
## Problem: Image cache is broken by namespace so images in different feeds are duplicated
6+
7+
Fix the file cache so that images are only stored once, even if they are used in multiple feeds.
8+
9+
## Generate example usages
10+
11+
* Changing the rendering of a feed item
12+
* Changing the rendering of a specific Instagram Feed Item
13+
* Appending expensive metadata to a feed item (e.g., fetching article `og` metadata)
14+
* Building an apprise notification from new feed items
15+
* Building a digest of a merge of multiple feeds

0 commit comments

Comments
 (0)