Skip to content

#1107 - Add okhttp IP address filter#1942

Open
rzo1 wants to merge 3 commits into
mainfrom
feature/1107-okhttp-ip-filter
Open

#1107 - Add okhttp IP address filter#1942
rzo1 wants to merge 3 commits into
mainfrom
feature/1107-okhttp-ip-filter

Conversation

@rzo1

@rzo1 rzo1 commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Optionally limit or block connections to IP address ranges once the host name has been resolved. Filtering URLs is not sufficient since a DNS entry may resolve to a private or loopback address, leaking information to a public index or archive, so the filtering happens at the protocol level.

Adds CIDR and IPFilterRules helpers and a network interceptor for the okhttp HttpProtocol, configured via http.filter.ipaddress.include and http.filter.ipaddress.exclude (comma-separated string or YAML list, supporting single IPs, CIDR blocks, localhost/loopback and sitelocal).

Includes unit tests and documentation updates.

Closes #1107

Thank you for contributing to Apache StormCrawler.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes

  • Is there a issue associated with this PR? Is it referenced in the commit message?

  • Does your PR title start with #XXXX where XXXX is the issue number you are trying to resolve?

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

  • Is the code properly formatted with mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?

For code changes

  • Have you ensured that the full suite of tests is executed via mvn clean verify?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file?

Optionally limit or block connections to IP address ranges once the host
name has been resolved. Filtering URLs is not sufficient since a DNS entry
may resolve to a private or loopback address, leaking information to a
public index or archive, so the filtering happens at the protocol level.

Adds CIDR and IPFilterRules helpers and a network interceptor for the
okhttp HttpProtocol, configured via http.filter.ipaddress.include and
http.filter.ipaddress.exclude (comma-separated string or YAML list,
supporting single IPs, CIDR blocks, localhost/loopback and sitelocal).

Includes unit tests and documentation updates.

Closes #1107
@rzo1 rzo1 requested review from jnioche and sebastian-nagel June 14, 2026 17:34
@rzo1 rzo1 added this to the 3.6.1 milestone Jun 14, 2026
@rzo1 rzo1 requested review from dpol1, mvolikas and sigee June 15, 2026 06:50

@jnioche jnioche left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 minor comments but looks good otherwise

Comment thread core/src/main/java/org/apache/stormcrawler/protocol/okhttp/CIDR.java Outdated
Comment thread core/src/main/java/org/apache/stormcrawler/protocol/okhttp/IPFilterRules.java Outdated

@dpol1 dpol1 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good feature, one correctness bug in the CIDR matcher that needs fixing before merge, details inline.

Comment thread core/src/main/java/org/apache/stormcrawler/protocol/okhttp/CIDR.java Outdated
Comment thread core/src/main/java/org/apache/stormcrawler/protocol/CIDR.java
Comment thread core/src/test/java/org/apache/stormcrawler/protocol/CIDRTest.java
Move CIDR and IPFilterRules from the okhttp package to
org.apache.stormcrawler.protocol so they can be reused by other
protocols (review feedback).

Fix CIDR byte mask wrapping for /32, /128 and IPv6 prefixes >= 32
(Java shifts mod 32) and reject out-of-range masks. Add a NOTE that
IP filtering is bypassed for proxied fetches.
@rzo1 rzo1 requested a review from dpol1 June 15, 2026 17:31

@sebastian-nagel sebastian-nagel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rzo1 and @dpol1!

Comment thread core/src/main/java/org/apache/stormcrawler/protocol/CIDR.java
Comment thread core/src/main/java/org/apache/stormcrawler/protocol/okhttp/CIDR.java Outdated
@rzo1

rzo1 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Tx for the review @sebastian-nagel , @dpol1 and @jnioche - I have adressed your comments.

@dpol1 dpol1 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rzo1 for the patience and the great feature, we are ready to merge👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Protocol-okhttp: implement IP filter

4 participants