kindi
Cryptographic evidence hunter for embedded firmware and software supply chains.
Performance · Quick Start · Field Test · AI Output · Fuzzing · Website
Named for Al-Kindi (801-873 CE), the Arab polymath who wrote Risalah fi Istikhraj al-Mu'amma -- the first known treatise on breaking ciphers through frequency analysis. 1,200 years later, Kindi does what he did: finds cryptographic patterns hiding in text, but at 200 MB/s with SIMD-accelerated Aho-Corasick automata.
cargo install kindi| Operation | Regex alternation | Kindi | Speedup |
|---|---|---|---|
| Keyword search (434 patterns) | 10,814 us | 332 us | 33x |
| API search (4,427 patterns) | 55,795 us | 318 us | 175x |
| Throughput | 1.2 - 6.1 MB/s | 199 - 208 MB/s | 33 - 175x |
Traditional scanners build (?:kw1|kw2|...|kw4427) and run re.finditer() line-by-line. That's O(n * m) per line.
Kindi compiles all patterns into an Aho-Corasick automaton -- single O(n) pass, SIMD memchr inner loop, zero allocation during search. Word boundaries enforced post-match. Line numbers via binary-searched LineIndex. Files scanned in parallel via Rayon.
# Scan firmware source tree
kindi --keywords patterns/keyword_list.txt --api-defs patterns/api_definitions.txt ./firmware
# Binary file analysis
kindi --keywords patterns.txt --scan-binary ./build/output
# AI agent mode (3 KB instead of 100 MB)
kindi --keywords patterns.txt --toon ./target
# Quick scan + CSV export
kindi --keywords patterns.txt --quick --csv -o results/ ./src
# Filter out noise
kindi --keywords patterns.txt --ignore-evidence-types generic,Hash ./srcFour legendary C monorepos. Identical pattern databases. Real benchmarks.
| Repo | Files | Source | Hits | Time | Throughput | Toon |
|---|---|---|---|---|---|---|
| Linux kernel | 93,188 | 1.7 GB | 108,010 | 0.64s | 146K files/s | 3.5 KB |
| OpenSSL | 6,098 | 144 MB | 272,051 | 1.03s | 5.9K files/s | 3.4 KB |
| Wireshark | 7,256 | 357 MB | 48,923 | 1.40s | 5.2K files/s | 3.5 KB |
| FFmpeg | 10,260 | 112 MB | 4,281 | 0.38s | 27K files/s | 2.7 KB |
| Total | 116,802 | 2.3 GB | 433,265 | 3.45s | 13 KB |
All four toon outputs combined: 3,272 tokens. That's 0.3% of a 1M-token context window.
48+ cryptographic algorithms, protocols, and library calls:
- Symmetric -- AES, DES, 3DES, Blowfish, Twofish, Camellia, ChaCha20, RC4, CAST5
- Asymmetric -- RSA, DSA, ECC, Diffie-Hellman, ElGamal
- Hash -- SHA-1, SHA-2, SHA-3, MD5, MD4, BLAKE, RIPEMD, HMAC
- Protocols -- TLS, SSL, SSH, Kerberos, PKCS, PKI
- Libraries -- OpenSSL, libgcrypt, Crypto++, WinCrypt
- Weak crypto flagged -- DES (NIST-withdrawn), MD5 (collision attacks), SHA-1 (NIST-2030), RC4 (biased keystream), Blowfish (64-bit block)
--toon compresses output for LLM consumption. The Linux kernel scan produces 100 MB of JSON (26M tokens). Toon gives you 3 KB (864 tokens). 30,228:1 compression.
@kindi pkg=linux files=93188 hits=108010 matched=4909 t=0.934s
#by_type count=8
AES 10804 908f drivers/crypto/atmel-aes.c:323
SHA2 3851 449f include/crypto/sha2.h:203
DES 2948 263f drivers/gpu/drm/amd/...:225
#weak count=4
DES 2948 263f DEPRECATED:NIST-withdrawn
MD5 1193 219f BROKEN:collision-attacks
SHA1 1364 330f DEPRECATED:NIST-2030
RC4 87 12f BROKEN:biased-keystream
#hot top=3
1010 drivers/crypto/inside-secure/safexcel_cipher.c AES,DES,SHA1
940 drivers/crypto/axis/artpec6_crypto.c AES,SHA2,HMAC
927 drivers/md/dm-crypt.c AES,HMAC,MD5
#methods
keyword 107562
api 448
Design principles:
- I (the AI) never need all N matches. I need aggregates and drill-down capability.
- Repeated JSON field names are pure token waste. Positional format eliminates them.
- Evidence grouped by type, not by file. That's how questions are asked.
- Weak crypto pre-classified. Don't make me consult training data for NIST status.
- Context lines omitted (70% of JSON volume). Available in drill-down mode.
src/
pattern.rs -- Aho-Corasick SIMD automaton, per-pattern \b word boundaries
detect.rs -- Rayon parallel scanning, encoding fallback, method tracking
extract.rs -- Archive extraction, SHA-256 cycle detection, path sanitization
toon.rs -- Token-optimized output with weak crypto classification
language.rs -- File classification (13 languages + binary)
output.rs -- Crypto-spec v3.0 JSON + CSV
error.rs -- Typed errors
main.rs -- CLI
Zero unsafe. Zero unwrap() in production code. Bounds-checked .get() on every hot-path index.
| Protection | Mechanism |
|---|---|
| Buffer overflow | PathBuf, Vec, .get() bounds checking |
| Null deref | Result<T, E> at every I/O boundary |
| Use-after-free | RAII via Drop |
| OOM | MAX_FILE_SIZE (256 MB) guard before allocation |
| Path traversal | sanitize_entry_path() strips ../ and leading / |
| Archive bombs | SHA-256 cycle detection in extraction graph |
8 libFuzzer targets with structure-aware Arbitrary inputs:
| Target | Invariants |
|---|---|
fuzz_pattern_parse |
No panics on arbitrary INI+JSON |
fuzz_pattern_search |
matched_text == content[begin..end], boundary enforcement, filtered types absent |
fuzz_extract_zip |
All files under target dir (no traversal) |
fuzz_extract_tar |
No panics on corrupt tar/gz/bz2/xz |
fuzz_encoding |
Output always valid UTF-8 |
fuzz_full_pipeline |
detection_method correct, verification code valid, JSON round-trips, CSV well-formed |
fuzz_extract_path_sanitize |
No directory escapes |
fuzz_language_classify |
Source implies text, binary implies not-source |
rustup toolchain install nightly
cargo +nightly fuzz run fuzz_pattern_search
cargo +nightly fuzz run fuzz_full_pipeline -- -max_total_time=30035 unit tests. Zero compiler warnings.
cargo test # 35 tests
cargo bench # Criterion benchmarkskindi [OPTIONS] <PACKAGES>...
Options:
--keywords <FILE> Keyword pattern file [default: keyword_list.txt]
--api-defs <FILE> API pattern file (whole-word matching)
-i, --ignore-case Case-insensitive
-q, --quick Quick scan (presence only)
--source-only Source files only
--scan-binary Include binary files
-s, --stop-after <N> Stop after N matched files
--ignore-evidence-types <T> Filter types (comma-separated)
-o, --output <DIR> Output directory [default: .]
--csv CSV alongside JSON
--toon AI agent token-optimized output
--pretty Pretty JSON
-v, --verbose Verbose
"One way to solve an encrypted message, if we know its language, is to find a different plaintext of the same language long enough to fill one sheet or so, and then we count the occurrences of each letter."
-- Abu Yusuf Ya'qub ibn Ishaq al-Kindi, Risalah fi Istikhraj al-Mu'amma, Baghdad, c. 850 CE
Apache-2.0