Skip to content

Add redis backed download count validator#1644

Open
janbro wants to merge 6 commits intoeclipse:masterfrom
yeeth-security:yeeth/download_flood_control
Open

Add redis backed download count validator#1644
janbro wants to merge 6 commits intoeclipse:masterfrom
yeeth-security:yeeth/download_flood_control

Conversation

@janbro
Copy link
Contributor

@janbro janbro commented Feb 26, 2026

Adds DownloadCountValidator that gates increaseDownloadCount in StorageUtilService.

When Redis is enabled, duplicate downloads from the same IP within 30 minutes are deduplicated and bot user-agents are ignored. Downloads are still served normally, only the counter is affected.

IPs are SHA-256 hashed before storage. Reuses the existing ip-address-function SpEL config for consistent IP resolution with rate limiting.

Before merging this PR ideally the 30 minute window is reviewed and agreed upon based on existing usage and data.

@netomi
Copy link
Contributor

netomi commented Feb 26, 2026

This PR needs to be discussed first before integrating.

My understanding was that we do a download count validator for analysing the download loads rather for the metric that are anyway just a fraction of the actual downloads. Furthermore the metrics themselves are not taken into account for anything only serve informational purpose, while the download logs directly affect the download counts.

@janbro
Copy link
Contributor Author

janbro commented Mar 4, 2026

Was looking at the wrong place for download count services used for production. Updating this PR with changes to both the Aws and Azure download count handlers. This will need some changes to the log records to support the deduping efforts

@janbro janbro force-pushed the yeeth/download_flood_control branch from ef9ee7a to e3fe741 Compare March 4, 2026 07:04
@janbro janbro force-pushed the yeeth/download_flood_control branch from e3fe741 to f9ce3a3 Compare March 4, 2026 07:40
@janbro
Copy link
Contributor Author

janbro commented Mar 5, 2026

Updated the handlers to use the count validator. Had to update the log parsers to read the information relevant to preventing download inflation. Example configuration for download count validation:

osvx:
  download-count:
    validation:
      enabled: false
      dedup-window-minutes: 60
      late-arrival-hours: 2
      automated-client-keywords:
        - "crawler"

Important

Old configuration, see new configuration below

Since the count handlers are scheduled to run every hour, it's important to set the late arrival hours to at least that window, as that adds additional time to the TTL of the redis keys in order to still have them alive during the next run. The keys though are aggregated into time buckets based on the dedup window so otherwise are not dependent on when the count handlers run.

@janbro
Copy link
Contributor Author

janbro commented Mar 5, 2026

will be updating this to have the configuration reference download count per hour rather than time window per feedback

@janbro
Copy link
Contributor Author

janbro commented Mar 5, 2026

Updated the logic to use max downloads per hour over a deduplication window to simplify the configuration and allow higher download counts per hour.

New configuration example is:

osvx:
  download-count:
    validation:
      enabled: false
      hourly-limit-per-ip: 50
      late-arrival-hours: 2
      automated-client-keywords:
        - "crawler"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants