SecretsHunter

A Django-based API for scanning GitHub repositories to detect potential secrets and sensitive information in code.

License: MIT

Description

SecretsHunter is a security scanning tool that analyzes public GitHub repositories for exposed secrets, API keys, passwords, and other sensitive information. It detects various types of secrets including AWS keys, GitHub tokens, private SSH/RSA keys, API keys (Stripe, Google, Slack), database connection strings, hardcoded passwords, JWT tokens, and OAuth tokens.

Getting Started

Prerequisites

Docker and Docker Compose
GitHub Personal Access Token

Quick Start

Clone the repository

git clone https://github.com/maaddae/fuzzy-carnival.git
cd secretshunter

Set up environment variables

Create a .envs/.local/.django file with your GitHub token:
```
GITHUB_TOKEN=your_github_personal_access_token_here
```

Start the application

docker compose -f docker-compose.local.yml up -d

Create a superuser

docker compose -f docker-compose.local.yml exec django python manage.py createsuperuser

Access the application
- API Documentation: http://localhost:8000/api/docs/
- Admin Panel: http://localhost:8000/admin/
- Mailpit: http://localhost:8025/

Core Functionality

API Endpoints

Scanning

POST /api/scans/ - Scan a GitHub repository for secrets
POST /api/scans/idempotent/ - Smart scanning with commit SHA deduplication
GET /api/scans/ - List all scans
GET /api/scans/{id}/ - Retrieve detailed scan results
POST /api/scans/{id}/create-issue/ - Create a GitHub issue for scan findings
PATCH /api/scans/{id}/mark_false_positive/ - Mark findings as false positives

Watchlist

POST /api/watchlist/ - Add repository to watchlist
GET /api/watchlist/ - List watched repositories
GET /api/watchlist/{id}/ - Get watchlist entry details
PATCH /api/watchlist/{id}/ - Update scan interval or active status
DELETE /api/watchlist/{id}/ - Remove repository from watchlist
POST /api/watchlist/{id}/scan_now/ - Trigger immediate scan

Features

Asynchronous scanning with Celery
Repository watchlist with periodic scanning at configurable intervals
Idempotent scans using commit SHA tracking
13+ secret detection patterns
Context preservation for findings
File filtering (binaries, dependencies, build artifacts)
GitHub API rate limit handling
Automatic GitHub issue creation for findings (configurable)
Manual issue creation via API endpoint
False positive management
Real-time scan status tracking

Configuration

Auto-Creating GitHub Issues

SecretsHunter can automatically create GitHub issues when secrets are found. Configure in .envs/.local/.django:

# Disable automatic issue creation after scan completion
AUTO_CREATE_GITHUB_ISSUES=False

# Minimum findings required to create an issue (default: 1)
AUTO_CREATE_ISSUE_THRESHOLD=1

# Delay in seconds before creating issue (default: 5)
AUTO_CREATE_ISSUE_DELAY=5

When enabled:

Issues are created automatically after scan completion
Only created if findings meet the threshold
Respects repository permissions (issues must be enabled)
Includes detailed Markdown report of all findings
Labeled as "security" for easy filtering

You can also manually trigger issue creation via the API:

curl -X POST "http://localhost:8000/api/scans/{scan_id}/create-issue/" \
  -H "Authorization: Token your_api_token"

Repository Watchlist

Add repositories to a watchlist for automatic periodic scanning:

# Add a repository to the watchlist
curl -X POST http://localhost:8000/api/watchlist/ \
  -H "Content-Type: application/json" \
  -d '{
    "repository_url": "https://github.com/owner/repo",
    "scan_interval": 86400
  }'

Available scan intervals:

3600 - Every Hour
21600 - Every 6 Hours
86400 - Daily (default)
604800 - Weekly

Watchlist features:

Automatic periodic scanning at configured intervals
Track scan history and statistics
Enable/disable monitoring without removing entries
Manual scan trigger via API
Next scan scheduling

Trigger immediate scan:

curl -X POST http://localhost:8000/api/watchlist/{id}/scan_now/

List watched repositories:

curl http://localhost:8000/api/watchlist/

Update scan interval:

curl -X PATCH http://localhost:8000/api/watchlist/{id}/ \
  -H "Content-Type: application/json" \
  -d '{"scan_interval": 3600, "is_active": true}'

Assumptions, Limitations, and Trade-offs

Assumptions

Public Repositories Only
- The scanner only works with public GitHub repositories
- Private repository support would require OAuth flow and repository permissions
GitHub API Rate Limits
- Assumes a valid GitHub Personal Access Token is provided
- Unauthenticated requests: 60 requests/hour
- Authenticated requests: 5,000 requests/hour
- Large repositories may consume significant rate limit quota
Repository Size
- Assumes repositories are reasonably sized
- Very large repositories may timeout or consume excessive resources
- Default timeout: 10 minutes per scan
Pattern Matching Accuracy
- Regex patterns are designed to minimize false positives
- Minimum length requirements (e.g., 20 chars for API keys)
- May miss some secrets with unusual formats or encodings
Network Connectivity
- Assumes reliable network connection to GitHub API
- No offline scanning capability

Limitations

Secret Detection
- Pattern-based only: Uses regex patterns, not entropy analysis or ML
- No context awareness: Cannot distinguish between real secrets and test data
- Base64 encoded secrets: Not automatically detected
- Obfuscated secrets: String concatenation or encoded values may be missed
- Language-specific formats: May not catch all language-specific secret patterns
Performance
- Synchronous file fetching: Files are scanned sequentially from GitHub API
- No caching: Each scan fetches fresh data (idempotent scans use commit SHA)
- Memory usage: Large files loaded entirely into memory
- Binary file detection: Basic heuristics, not foolproof
Scalability
- Single-threaded scanning: One repository scanned at a time per worker
- No distributed scanning: Cannot split large repo across multiple workers
- Database bottleneck: All findings stored in PostgreSQL (not optimized for massive scale)
GitHub Issue Creation
- Requires repository issues enabled: Cannot create issues if disabled by owner
- No update mechanism: Cannot update existing issues with new findings
- Rate limit aware: But doesn't implement exponential backoff for 403 errors
- No duplicate detection: May create multiple issues for same repository
Watchlist
- No webhook support: Relies on periodic polling instead of GitHub webhooks
- Fixed intervals: Cannot trigger scans based on push events
- No priority queue: All repositories treated equally regardless of risk
Security
- API authentication: Currently allows anonymous scans (AllowAny permission)
- No rate limiting: Application-level rate limiting not implemented

Trade-offs

Pattern Matching vs ML
- Chosen: Regex-based pattern matching
- Trade-off: Faster, deterministic, but less accurate than ML models
- Rationale: Simpler to implement, maintain, and explain; no training data required
Synchronous vs Async Scanning
- Chosen: Celery tasks for async scanning
- Trade-off: More complex infrastructure (Redis, workers) but better UX
- Rationale: Allows API to return immediately; handles long-running scans gracefully
GitHub API vs Git Clone
- Chosen: GitHub REST API
- Trade-off: Rate limited but no disk space requirements
- Rationale: Simpler, no local git operations, works in containers
PostgreSQL vs NoSQL
- Chosen: PostgreSQL for all data
- Trade-off: Strong consistency but potential performance bottleneck at scale
- Rationale: Django ORM support, ACID guarantees, relational data model fits use case
False Positive Handling
- Chosen: Manual marking by users
- Trade-off: Requires human review but allows learning
- Rationale: No ML model to train; simple implementation
Issue Auto-creation
- Chosen: Optional auto-creation with configurable threshold
- Trade-off: May create noise but ensures visibility
- Rationale: Configurable gives users control; defaults to off

Future Improvements

With More Time (Priority Order)

High Priority

Enhanced Secret Detection
- Implement entropy analysis for detecting high-randomness strings
- Add machine learning model for context-aware detection
- Support for base64/hex encoded secrets
- Custom regex pattern upload by users
- Confidence scoring for findings
Performance Optimization
- Implement concurrent file fetching using asyncio or httpx
- Add Redis caching for repository metadata and file contents
- Stream large files instead of loading into memory
- Batch database inserts for findings
- Add database indexes for common queries
Webhook Support
- GitHub webhook integration for real-time scanning
- Scan on push events instead of periodic polling
- Priority queue for recently updated repositories
Authentication & Authorization
- JWT token-based API authentication
- User registration and login
- Organization/team support with role-based access
- OAuth integration with GitHub
- API rate limiting per user/organization
Improved GitHub Integration
- Support for private repositories via OAuth
- Update existing issues with new findings
- Close issues when secrets are remediated
- GitHub App instead of personal access tokens
- Repository suggestions based on user's GitHub access

Medium Priority

Advanced Reporting
- Trend analysis dashboard (findings over time)
- Export findings to CSV/JSON/PDF
- Integration with security tools (SIEM, Slack, PagerDuty)
- Risk scoring based on secret type and exposure time
- Compliance reports (SOC 2, ISO 27001)
False Positive Reduction
- Machine learning model trained on labeled data
- Common test patterns whitelist
- Repository-specific ignore patterns (.secretshunterignore)
- Feedback loop for improving patterns
Scalability
- Horizontal scaling of Celery workers
- Task distribution across multiple queues
- Implement job prioritization
- Add monitoring and alerting (Prometheus, Grafana)
- Database read replicas for reporting queries
Multi-Platform Support
- GitLab repository scanning
- Bitbucket support
- Local git repository scanning
- S3 bucket scanning
- Docker image scanning

Low Priority

Developer Experience
- CLI tool for local scanning
- VS Code extension
- Pre-commit hooks integration
- Real-time scanning during code review
- IDE plugins (IntelliJ, PyCharm)
Advanced Features
- Secret rotation workflow integration
- Automated remediation suggestions
- Integration with secret management tools (Vault, AWS Secrets Manager)
- Historical secret tracking across commits
- Diff-based scanning (only changed files)
Testing & Quality
- Integration tests for GitHub API mocking
- Load testing with large repositories
- Chaos engineering for failure scenarios
- Performance benchmarking suite
- A/B testing for detection algorithms

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.envs/.local		.envs/.local
.github		.github
compose		compose
config		config
docs		docs
locale		locale
secretshunter		secretshunter
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.readthedocs.yml		.readthedocs.yml
CONTRIBUTORS.txt		CONTRIBUTORS.txt
LICENSE		LICENSE
README.md		README.md
docker-compose.docs.yml		docker-compose.docs.yml
docker-compose.local.yml		docker-compose.local.yml
docker-compose.production.yml		docker-compose.production.yml
justfile		justfile
manage.py		manage.py
merge_production_dotenvs_in_dotenv.py		merge_production_dotenvs_in_dotenv.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SecretsHunter

Description

Getting Started

Prerequisites

Quick Start

Core Functionality

API Endpoints

Scanning

Watchlist

Features

Configuration

Auto-Creating GitHub Issues

Repository Watchlist

Assumptions, Limitations, and Trade-offs

Assumptions

Limitations

Trade-offs

Future Improvements

With More Time (Priority Order)

High Priority

Medium Priority

Low Priority

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SecretsHunter

Description

Getting Started

Prerequisites

Quick Start

Core Functionality

API Endpoints

Scanning

Watchlist

Features

Configuration

Auto-Creating GitHub Issues

Repository Watchlist

Assumptions, Limitations, and Trade-offs

Assumptions

Limitations

Trade-offs

Future Improvements

With More Time (Priority Order)

High Priority

Medium Priority

Low Priority

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages