GitHub - copyleftdev/kindi: Cryptographic evidence hunter for embedded firmware and software supply chains. Named for Al-Kindi (801-873 CE), father of cryptanalysis. SIMD Aho-Corasick, Rayon parallel, 33-175x faster, 8 fuzz targets, zero unsafe.

kindi

Cryptographic evidence hunter for embedded firmware and software supply chains.

Performance · Quick Start · Field Test · AI Output · Fuzzing · Website

Named for Al-Kindi (801-873 CE), the Arab polymath who wrote Risalah fi Istikhraj al-Mu'amma -- the first known treatise on breaking ciphers through frequency analysis. 1,200 years later, Kindi does what he did: finds cryptographic patterns hiding in text, but at 200 MB/s with SIMD-accelerated Aho-Corasick automata.

cargo install kindi

Performance

Operation	Regex alternation	Kindi	Speedup
Keyword search (434 patterns)	10,814 us	332 us	33x
API search (4,427 patterns)	55,795 us	318 us	175x
Throughput	1.2 - 6.1 MB/s	199 - 208 MB/s	33 - 175x

How

Traditional scanners build (?:kw1|kw2|...|kw4427) and run re.finditer() line-by-line. That's O(n * m) per line.

Kindi compiles all patterns into an Aho-Corasick automaton -- single O(n) pass, SIMD memchr inner loop, zero allocation during search. Word boundaries enforced post-match. Line numbers via binary-searched LineIndex. Files scanned in parallel via Rayon.

Quick Start

# Scan firmware source tree
kindi --keywords patterns/keyword_list.txt --api-defs patterns/api_definitions.txt ./firmware

# Binary file analysis
kindi --keywords patterns.txt --scan-binary ./build/output

# AI agent mode (3 KB instead of 100 MB)
kindi --keywords patterns.txt --toon ./target

# Quick scan + CSV export
kindi --keywords patterns.txt --quick --csv -o results/ ./src

# Filter out noise
kindi --keywords patterns.txt --ignore-evidence-types generic,Hash ./src

Field Test

Four legendary C monorepos. Identical pattern databases. Real benchmarks.

Repo	Files	Source	Hits	Time	Throughput	Toon
Linux kernel	93,188	1.7 GB	108,010	0.64s	146K files/s	3.5 KB
OpenSSL	6,098	144 MB	272,051	1.03s	5.9K files/s	3.4 KB
Wireshark	7,256	357 MB	48,923	1.40s	5.2K files/s	3.5 KB
FFmpeg	10,260	112 MB	4,281	0.38s	27K files/s	2.7 KB
Total	116,802	2.3 GB	433,265	3.45s		13 KB

All four toon outputs combined: 3,272 tokens. That's 0.3% of a 1M-token context window.

What It Finds

48+ cryptographic algorithms, protocols, and library calls:

Symmetric -- AES, DES, 3DES, Blowfish, Twofish, Camellia, ChaCha20, RC4, CAST5
Asymmetric -- RSA, DSA, ECC, Diffie-Hellman, ElGamal
Hash -- SHA-1, SHA-2, SHA-3, MD5, MD4, BLAKE, RIPEMD, HMAC
Protocols -- TLS, SSL, SSH, Kerberos, PKCS, PKI
Libraries -- OpenSSL, libgcrypt, Crypto++, WinCrypt
Weak crypto flagged -- DES (NIST-withdrawn), MD5 (collision attacks), SHA-1 (NIST-2030), RC4 (biased keystream), Blowfish (64-bit block)

AI Agent Output

--toon compresses output for LLM consumption. The Linux kernel scan produces 100 MB of JSON (26M tokens). Toon gives you 3 KB (864 tokens). 30,228:1 compression.

@kindi pkg=linux files=93188 hits=108010 matched=4909 t=0.934s
#by_type count=8
AES 10804 908f  drivers/crypto/atmel-aes.c:323
SHA2 3851 449f  include/crypto/sha2.h:203
DES 2948 263f  drivers/gpu/drm/amd/...:225
#weak count=4
DES 2948 263f DEPRECATED:NIST-withdrawn
MD5 1193 219f BROKEN:collision-attacks
SHA1 1364 330f DEPRECATED:NIST-2030
RC4 87 12f BROKEN:biased-keystream
#hot top=3
1010 drivers/crypto/inside-secure/safexcel_cipher.c AES,DES,SHA1
940  drivers/crypto/axis/artpec6_crypto.c AES,SHA2,HMAC
927  drivers/md/dm-crypt.c AES,HMAC,MD5
#methods
keyword 107562
api 448

Design principles:

I (the AI) never need all N matches. I need aggregates and drill-down capability.
Repeated JSON field names are pure token waste. Positional format eliminates them.
Evidence grouped by type, not by file. That's how questions are asked.
Weak crypto pre-classified. Don't make me consult training data for NIST status.
Context lines omitted (70% of JSON volume). Available in drill-down mode.

Architecture

src/
  pattern.rs   -- Aho-Corasick SIMD automaton, per-pattern \b word boundaries
  detect.rs    -- Rayon parallel scanning, encoding fallback, method tracking
  extract.rs   -- Archive extraction, SHA-256 cycle detection, path sanitization
  toon.rs      -- Token-optimized output with weak crypto classification
  language.rs  -- File classification (13 languages + binary)
  output.rs    -- Crypto-spec v3.0 JSON + CSV
  error.rs     -- Typed errors
  main.rs      -- CLI

Memory Safety

Zero unsafe. Zero unwrap() in production code. Bounds-checked .get() on every hot-path index.

Protection	Mechanism
Buffer overflow	`PathBuf`, `Vec`, `.get()` bounds checking
Null deref	`Result<T, E>` at every I/O boundary
Use-after-free	RAII via `Drop`
OOM	`MAX_FILE_SIZE` (256 MB) guard before allocation
Path traversal	`sanitize_entry_path()` strips `../` and leading `/`
Archive bombs	SHA-256 cycle detection in extraction graph

Fuzzing

8 libFuzzer targets with structure-aware Arbitrary inputs:

Target	Invariants
`fuzz_pattern_parse`	No panics on arbitrary INI+JSON
`fuzz_pattern_search`	`matched_text == content[begin..end]`, boundary enforcement, filtered types absent
`fuzz_extract_zip`	All files under target dir (no traversal)
`fuzz_extract_tar`	No panics on corrupt tar/gz/bz2/xz
`fuzz_encoding`	Output always valid UTF-8
`fuzz_full_pipeline`	detection_method correct, verification code valid, JSON round-trips, CSV well-formed
`fuzz_extract_path_sanitize`	No directory escapes
`fuzz_language_classify`	Source implies text, binary implies not-source

rustup toolchain install nightly
cargo +nightly fuzz run fuzz_pattern_search
cargo +nightly fuzz run fuzz_full_pipeline -- -max_total_time=300

Testing

35 unit tests. Zero compiler warnings.

cargo test    # 35 tests
cargo bench   # Criterion benchmarks

CLI Reference

kindi [OPTIONS] <PACKAGES>...

Options:
    --keywords <FILE>            Keyword pattern file [default: keyword_list.txt]
    --api-defs <FILE>            API pattern file (whole-word matching)
-i, --ignore-case                Case-insensitive
-q, --quick                      Quick scan (presence only)
    --source-only                Source files only
    --scan-binary                Include binary files
-s, --stop-after <N>             Stop after N matched files
    --ignore-evidence-types <T>  Filter types (comma-separated)
-o, --output <DIR>               Output directory [default: .]
    --csv                        CSV alongside JSON
    --toon                       AI agent token-optimized output
    --pretty                     Pretty JSON
-v, --verbose                    Verbose

Origin

"One way to solve an encrypted message, if we know its language, is to find a different plaintext of the same language long enough to fill one sheet or so, and then we count the occurrences of each letter."

-- Abu Yusuf Ya'qub ibn Ishaq al-Kindi, Risalah fi Istikhraj al-Mu'amma, Baghdad, c. 850 CE

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
benches		benches
docs		docs
fuzz		fuzz
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Performance

How

Quick Start

Field Test

What It Finds

AI Agent Output

Architecture

Memory Safety

Fuzzing

Testing

CLI Reference

Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Performance

How

Quick Start

Field Test

What It Finds

AI Agent Output

Architecture

Memory Safety

Fuzzing

Testing

CLI Reference

Origin

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages