#1107 - Add okhttp IP address filter#1942
Open
rzo1 wants to merge 3 commits into
Open
Conversation
Optionally limit or block connections to IP address ranges once the host name has been resolved. Filtering URLs is not sufficient since a DNS entry may resolve to a private or loopback address, leaking information to a public index or archive, so the filtering happens at the protocol level. Adds CIDR and IPFilterRules helpers and a network interceptor for the okhttp HttpProtocol, configured via http.filter.ipaddress.include and http.filter.ipaddress.exclude (comma-separated string or YAML list, supporting single IPs, CIDR blocks, localhost/loopback and sitelocal). Includes unit tests and documentation updates. Closes #1107
jnioche
approved these changes
Jun 15, 2026
jnioche
left a comment
Contributor
There was a problem hiding this comment.
2 minor comments but looks good otherwise
dpol1
reviewed
Jun 15, 2026
dpol1
left a comment
Member
There was a problem hiding this comment.
good feature, one correctness bug in the CIDR matcher that needs fixing before merge, details inline.
Move CIDR and IPFilterRules from the okhttp package to org.apache.stormcrawler.protocol so they can be reused by other protocols (review feedback). Fix CIDR byte mask wrapping for /32, /128 and IPv6 prefixes >= 32 (Java shifts mod 32) and reject out-of-range masks. Add a NOTE that IP filtering is bypassed for proxied fetches.
sebastian-nagel
approved these changes
Jun 15, 2026
Contributor
Author
|
Tx for the review @sebastian-nagel , @dpol1 and @jnioche - I have adressed your comments. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optionally limit or block connections to IP address ranges once the host name has been resolved. Filtering URLs is not sufficient since a DNS entry may resolve to a private or loopback address, leaking information to a public index or archive, so the filtering happens at the protocol level.
Adds CIDR and IPFilterRules helpers and a network interceptor for the okhttp HttpProtocol, configured via http.filter.ipaddress.include and http.filter.ipaddress.exclude (comma-separated string or YAML list, supporting single IPs, CIDR blocks, localhost/loopback and sitelocal).
Includes unit tests and documentation updates.
Closes #1107
Thank you for contributing to Apache StormCrawler.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes
Is there a issue associated with this PR? Is it referenced in the commit message?
Does your PR title start with
#XXXXwhereXXXXis the issue number you are trying to resolve?Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
Is the code properly formatted with
mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?For code changes
mvn clean verify?