A high-performance IP denylist lookup service that downloads IP lists from firehol and other sources, processes them into unique non-overlapping ranges while maintaining the list name each IP belongs to, uploads ranges to Redis, and serves HTTP/WebSocket endpoints to query IP addresses. The service periodically refreshes lists according to a configurable schedule.
I wanted to be able to look up an IP address to know how risky it is. Firehol curates a list of suspicious IP addresses. There are two difficulties: a) keeping track of which list the IP address belongs to and b) making the lookup fast. A traditional database is not apt because the need is to search where $ip is between start_ip_range and end_ip_range which can't be optimized with a DB index. Some lookups were taking 7s when attempting the above with MySql. Sqlite is faster but still 100s of ms.
Digging into the Internet suggested that breaking the IP ranges into non-overlapping unique ranges and using a skip list was the way to go. This is what I have attempted to do (using Redis) and lookups now take 3ms, max. I think it works correctly. See bottom of page for more info on how this was accomplished.
- High Performance: Sub-3ms IP lookups using Redis sorted sets
- WebSocket Support: Real-time IP lookups via WebSocket protocol
- Rate Limiting: Configurable rate limiting for HTTP and WebSocket endpoints
- Structured Logging: Pino-based structured logging for better observability
- Plugin Architecture: Extensible plugin system for adding custom IP list sources
- Health Checks: Built-in health check endpoint for monitoring
- Modern Node.js: Built on Node.js 20 LTS with modern patterns
- Robust Update Process:
- Distributed locking prevents concurrent updates
- Atomic file operations prevent data corruption
- CSV validation before loading
- Update status tracking and monitoring
- Automatic error recovery and cleanup
- Clone repo
cdinto directorynpm install
The project includes a comprehensive test suite with unit and integration tests.
# Run all tests
npm test
# Run only unit tests
npm run test:unit
# Run only integration tests
npm run test:integration
# Run tests in watch mode
npm run test:watch
# Generate coverage report
npm run test:coverage- Unit Tests: No external dependencies, run fast
- Integration Tests: Require Redis running (uses test database DB 15)
See test/README.md for detailed testing documentation.
The service uses environment variables for configuration. Create a .env file (see .env.example for reference) or set environment variables:
REDIS_HOST- Redis host (default:localhost)REDIS_PORT- Redis port (default:6379)REDIS_IP_FAMILY- IP family:4for IPv4 or6for IPv6 (default:4)REDIS_PASS- Redis password (optional)REDIS_DB- Redis database number (default:0)
IP_REDIS_PREFIX- Redis key prefix (default:ip_lists:)IP_DOWNLOAD_LOCATION- Location for downloaded IP file (default:./ipFile)IP_COLLECT_GARBAGE- Enable garbage collection during load (default:false)IP_HTTP_PORT- HTTP server port (default:3000)IP_PREFIX- URL prefix for routes (default:/)IP_CRON- Cron schedule for refreshing lists (default:5 2 * * *)IP_CRON_TIMEZONE- Timezone for cron schedule (default:UTC). Use IANA timezone names likeEurope/Madrid,America/New_York, etc.
LOG_LEVEL- Log level:fatal,error,warn,info,debug,trace(default:info)LOG_PRETTY- Enable pretty printing for development (default:false)NODE_ENV- Environment:development,production,test(default:production)
RATE_LIMIT_WINDOW_MS- Rate limit window in milliseconds (default:60000= 1 minute)RATE_LIMIT_MAX_REQUESTS- Max requests per window per IP (default:1000)RATE_LIMIT_WS_MAX_MESSAGES- Max WebSocket messages per window per connection (default:5000)
WS_ENABLED- Enable WebSocket server (default:true)
HEALTH_CHECK_ENABLED- Enable health check endpoint (default:true)
- Set any necessary environment variables (see Configuration section above)
- Edit plugin configuration in
plugins.jsif needed - Place any additional lists into the
./stagingfolder - Run:
NODE_OPTIONS=--max_old_space_size=4096 node --expose-gc launch.js
The launch.js script supports the following command-line flags to control which operations are performed:
--download- Download IP lists from configured plugins and write to staging folder (default:true)--process- Process staged files from./stagingfolder into a single CSV file (default:true)--load- Load the CSV file into Redis (default:true)--serve- Start the HTTP/WebSocket server (default:true)
Operation Flow:
- Download: Runs plugins to download IP lists and write them to the
./stagingfolder - Process: Concatenates all files from
./staginginto a single CSV file - Load: Reads the CSV file and loads IP ranges into Redis
- Serve: Starts the HTTP/WebSocket server for IP lookups
Important Notes:
- All flags default to
true, so use--no-<flag>or--<flag> falseto disable operations --process falseonly prevents creating/updating the CSV file; it does not prevent loading- To skip loading, you must also pass
--load falseor--no-load - If
--process falsebut--load true, the load step will process whatever CSV file already exists
Examples:
# Run all operations (default behavior)
node launch.js
# Only download and process, don't load or serve
node launch.js --no-load --no-serve
# Only start the server (assumes data already loaded)
node launch.js --no-download --no-process --no-load
# Download and process only, skip loading and serving
node launch.js --no-load --no-serve
# Skip download and process, but still load existing CSV and serve
node launch.js --no-download --no-process
# Skip download, process, and load - only start server
node launch.js --no-download --no-process --no-loadNote: The script will run the update process once on startup, then continue running the server (if --serve is enabled) and execute scheduled updates according to the configured cron schedule.
- Edit
docker-compose.ymlas needed docker-compose up
Health check endpoint. Returns service status, Redis connection status, and update process information.
Response:
{
"status": "healthy",
"timestamp": "2024-01-01T00:00:00.000Z",
"redis": "connected",
"update": {
"inProgress": false,
"status": "completed",
"lastUpdate": "2024-01-01 02:05:00",
"dataSize": 1234567
}
}Status Values:
healthy- Service is operating normallydegraded- Service is running but last update failedunhealthy- Redis connection failed
Update Status Values:
in_progress- Update is currently runningcompleted- Last update completed successfullyfailed- Last update failedskipped- Update was skipped (e.g., lock already held)unknown- No update status available
Lookup the requesting client's IP address.
Query Parameters:
csv- Return CSV format (set to1,true, or'true')header- Include CSV header (default:true, set to0orfalseto disable)
Response:
{
"ip": "192.168.1.1",
"result": {
"list": [
{
"name": "firehol_level1",
"source": "firehol"
}
]
}
}Lookup a specific IP address.
Path Parameters:
ip- IPv4 address to lookup
Query Parameters:
csv- Return CSV format (set to1,true, or'true')header- Include CSV header (default:true)
Response:
{
"list": [
{
"name": "firehol_level1",
"source": "firehol"
}
]
}Status Codes:
200- IP found404- IP not found in any list422- Invalid IPv4 address
Batch lookup multiple IP addresses.
Request Body:
- JSON array:
["192.168.1.1", "10.0.0.1"](withContent-Type: application/json) - Text: Comma or newline-separated IPs (with any other
Content-Typeor noContent-Type)
Query Parameters:
json- Return JSON format (set to1,true, or'true'). When set, response will be JSON regardless of requestContent-Typeheader- Include CSV header when returning CSV format (default:true, set to0orfalseto disable)
Response Format:
- JSON (default when
Content-Type: application/jsonor?json=true):
{
"192.168.1.1": {
"list": [...]
},
"10.0.0.1": {}
}- CSV (default when posting plain text without
?json=true):
ip,list,country
192.168.1.1,firehol_level1|spamhaus_drop,
10.0.0.1,,Examples:
- POST plain text with
?json=true→ JSON response - POST JSON array → JSON response (default)
- POST plain text without
?json=true→ CSV response (default)
Upload a file containing IP addresses for batch lookup.
Request:
- Multipart form data with field
ipList - File can be
.json(JSON array) or text (comma/newline-separated)
Response:
- Returns results in same format as uploaded file (JSON or CSV)
Connect to ws://localhost:3000/ws (or your configured prefix + /ws).
Client → Server:
- Lookup single IP:
{
"type": "lookup",
"ip": "192.168.1.1",
"requestId": "optional-request-id"
}- Batch lookup:
{
"type": "batch",
"ips": ["192.168.1.1", "10.0.0.1"],
"requestId": "optional-request-id"
}- Ping:
{
"type": "ping",
"requestId": "optional-request-id"
}Server → Client:
- Lookup result:
{
"type": "result",
"ip": "192.168.1.1",
"data": {
"list": [...]
},
"requestId": "optional-request-id"
}- Batch result:
{
"type": "batch_result",
"results": {
"192.168.1.1": {...},
"10.0.0.1": {}
},
"requestId": "optional-request-id"
}- Error:
{
"type": "error",
"message": "Error description",
"requestId": "optional-request-id"
}- Pong:
{
"type": "pong",
"timestamp": 1234567890,
"requestId": "optional-request-id"
}- Connected:
{
"type": "connected",
"message": "WebSocket connection established",
"protocols": ["lookup", "batch", "ping"]
}- Default: 5000 messages per minute per connection
- Configurable via
RATE_LIMIT_WS_MAX_MESSAGESenvironment variable - Exceeding limit results in connection closure with code 1008
All HTTP endpoints (except /health) are rate limited:
- Default: 1000 requests per minute per IP address
- Configurable via
RATE_LIMIT_MAX_REQUESTSandRATE_LIMIT_WINDOW_MS - Rate limit headers included in responses:
X-RateLimit-Limit: Maximum requests per windowX-RateLimit-Remaining: Remaining requests in current windowX-RateLimit-Reset: Unix timestamp when window resets
- Exceeding limit returns
429 Too Many Requests
Node.js can use a lot of memory while loading IPs into Redis. Memory usage is lower once loading is complete. On reference machine:
- Loading takes 2.7GB, 32s without
--collectGarbageoption - Loading takes 1.1GB, 43s with
--collectGarbage NODE_OPTIONS=--max_old_space_size=4096or similar env variable required
- The data in Redis takes about 420M once loaded
- The node script uses about 1GB of memory at rest
- Wait for download and processing; system is ready when you see
ready to serve!in logs - Visit
http://localhost:3000/192.168.0.1to test - Updated lists will be pulled according to given cron schedule
Create your own "plugins" to add more IP lists:
- See
pluginsfolder andplugins.jsfor examples - Plugins should extend
BasePluginfromplugins/base.js - A plugin must add a file to the staging folder
- Plugin must implement
load()method that returns a Promise - If the plugin has dependencies, create a
package.jsonfile and reference the plugin in the project's mainpackage.jsonfile
Example Plugin:
const BasePlugin = require('../base');
class MyPlugin extends BasePlugin {
constructor(options) {
super({
name: 'myplugin',
version: '1.0.0',
description: 'My custom IP list plugin',
abortOnFail: false
});
this.outputFile = options.outputFile;
}
async load() {
// Download/process IP lists
// Write to this.outputFile
return 'success';
}
}
module.exports = MyPlugin;The service maintains backward compatibility with existing HTTP API endpoints. Key changes:
-
Environment Variables: Now uses
config.jswith Joi validation. Most environment variables remain the same, but validation is stricter. -
Logging: All
console.logstatements replaced with structured logging via Pino. SetLOG_LEVELandLOG_PRETTYenvironment variables to control logging. -
Dependencies: Updated to modern versions. Run
npm installto update. -
WebSocket: New feature, opt-in. Set
WS_ENABLED=falseto disable. -
Rate Limiting: Now enabled by default. Configure via environment variables.
-
Health Checks: New
/healthendpoint added. Can be disabled viaHEALTH_CHECK_ENABLED=false. -
Redis Connection: Now uses connection pooling and better error handling. No changes required to Redis data structure.
-
Plugins: Enhanced plugin architecture with
BasePluginclass. Legacy plugins still work, but consider migrating to new architecture.
- None! All existing HTTP endpoints work as before.
- Check your Docker VM settings in Windows or Mac; 2GB RAM won't cut it
- Use
NODE_OPTIONS=--max_old_space_size=4096or higher
- Ensure Redis is running and accessible before starting the service
- Check Redis connection settings in environment variables
- Rate limits apply per IP address for HTTP endpoints
- WebSocket rate limits apply per connection
- Health check endpoint is excluded from rate limiting
- Concurrent Updates: The system uses distributed locking to prevent concurrent updates. If an update is already in progress, subsequent cron triggers will be skipped.
- Atomic Operations: CSV files are written to temporary files first, then atomically renamed to prevent corruption.
- Validation: CSV files are validated before loading into Redis to ensure data integrity.
- Error Recovery: On failure, the system automatically cleans up temporary files and releases locks.
- Status Tracking: Update status is tracked in Redis and exposed via the
/healthendpoint. - Lock Timeout: Update locks expire after 1 hour (TTL) to prevent deadlocks if a process crashes.
- Stale Lock Detection: The system automatically detects and cleans up locks held by dead processes. The health check (
/health) will showlockStale: trueif a stale lock is detected. Stale locks are automatically cleaned up when the next update attempt runs, or can be manually cleaned up viaPOST /admin/cleanup-stale-lock.
- Lookup Speed: < 3ms per IP lookup (typical)
- Throughput: Handles 1000+ requests/second per instance
- Memory: ~1GB at rest, ~2.7GB during loading
- Redis Storage: ~420MB for full Firehol dataset
- Check Redis connection settings
- Verify environment variables are set correctly
- Check logs for error messages
- Verify Redis is running and accessible
- Check Redis connection pool settings
- Monitor Redis performance
- Adjust
RATE_LIMIT_MAX_REQUESTSif needed - Consider using WebSocket for high-volume scenarios
- Health checks are not rate limited
Based on prior work: