This document describes all public API entry points with signatures, use cases, side effects, and minimal call examples.
- Fail-closed: error paths return only safe outputs (
Unknown,False, empty list). - Side effects: filesystem writes or global option changes.
- Flow ID: reference to architecture flows in
docs/120_ARCH_CORE.MD.
TryValidateArchive(...)is the canonical validation API.- The method validates all internally supported archive formats fail-closed (e.g., ZIP/TAR/GZIP/7z/RAR).
- Technically, type resolution happens via
ArchiveTypeResolver+ArchiveSafetyGate; OOXML refinement stays container-specific for ZIP-based OOXML files. - Terminology:
ContainerTypedenotes the physical archive format (e.g., ZIP/TAR/GZIP/7z/RAR). Generic archives are returned compatibly asFileKind.Zip; ZIP-based OOXML containers can be refined toDocx/Xlsx/Pptx.
| API family | Detail source | Purpose |
|---|---|---|
FileTypeDetector / ArchiveProcessing |
Detection module | SSOT detection, header magic, alias logic |
FileTypeDetector / ArchiveProcessing / FileMaterializer |
Infrastructure module | archive gate, guards, extraction engine |
FileTypeOptions / FileTypeProjectBaseline |
Configuration module | global options and baseline |
Return models (FileType, DetectionDetail, ZipExtractedEntry, Hash*) |
Abstractions module | public API model contracts |
| Module navigation | FileClassifier module | overview and reader-role entry points |
Return models are split into subfolders purely for organization; API semantics stay unchanged:
- Detection models: Abstractions Detection module
- Archive models: Abstractions Archive module
- Hashing models: Abstractions Hashing module
| Term | Meaning |
|---|---|
| Archive format | Physical container such as ZIP, TAR, TAR.GZ, 7z, or RAR. |
| ContainerType | Internal technical type for the physical archive format. |
| FileKind.Zip | Public, compatible logical return kind for generic archives (e.g., TAR/TAR.GZ/7z/RAR). |
| PhysicalHash | SHA-256 over the unmodified raw bytes of a file or byte array. |
| LogicalHash | SHA-256 over canonicalized content state (entry path + content), independent of container details. |
| FastHash | Optional, non-cryptographic comparison digest (XxHash3) for performance shortcuts. |
| SecureHash | Optional keyed digest (HMAC-SHA256) for authenticity/tamper-evidence (via FILECLASSIFIER_HMAC_KEY_B64). |
| Fail-closed | Unsafe or invalid inputs return only safe outputs (Unknown, False, empty list). |
| Family | Method | Input | Output | Side effects | Primary flow |
|---|---|---|---|---|---|
FileTypeDetector |
ReadFileSafe(path) |
file path | Byte() |
none (read-only) | F0 |
FileTypeDetector |
Detect(path) |
file path | FileType |
none | F1 |
FileTypeDetector |
Detect(path, verifyExtension) |
file path + bool | FileType |
none | F1 |
FileTypeDetector |
DetectDetailed(path) |
file path | DetectionDetail |
none | F1 |
FileTypeDetector |
DetectDetailed(path, verifyExtension) |
file path + bool | DetectionDetail |
none | F1 |
FileTypeDetector |
DetectAndVerifyExtension(path) |
file path | Boolean |
none | F8 |
FileTypeDetector |
TryValidateArchive(path) |
file path | Boolean |
none | F3 |
FileTypeDetector |
Detect(data) |
Byte() |
FileType |
none | F2 |
FileTypeDetector |
IsOfType(data, kind) |
Byte() + FileKind |
Boolean |
none | F2 |
FileTypeDetector |
ExtractArchiveSafe(path, destination, verifyBeforeExtract) |
path + destination + bool | Boolean |
writes to disk | F5 |
FileTypeDetector |
ExtractArchiveSafeToMemory(path, verifyBeforeExtract) |
path + bool | IReadOnlyList(Of ZipExtractedEntry) |
none | F4 |
ArchiveProcessing |
TryValidate(path) |
file path | Boolean |
none | F3 |
ArchiveProcessing |
TryValidate(data) |
Byte() |
Boolean |
none | F3 |
ArchiveProcessing |
ExtractToMemory(path, verifyBeforeExtract) |
path + bool | IReadOnlyList(Of ZipExtractedEntry) |
none | F4 |
ArchiveProcessing |
TryExtractToMemory(data) |
Byte() |
IReadOnlyList(Of ZipExtractedEntry) |
none | F4 |
FileMaterializer |
Persist(data, destinationPath) |
Byte() + destination path |
Boolean |
writes to disk | F6 |
FileMaterializer |
Persist(data, destinationPath, overwrite) |
Byte() + destination path + bool |
Boolean |
writes to disk | F6 |
FileMaterializer |
Persist(data, destinationPath, overwrite, secureExtract) |
Byte() + destination path + 2 bool |
Boolean |
writes to disk | F5/F6 |
EvidenceHashing |
HashFile(path) |
file path | HashEvidence |
none | F9 |
EvidenceHashing |
HashFile(path, options) |
file path + options | HashEvidence |
none | F9 |
EvidenceHashing |
HashBytes(data) |
Byte() |
HashEvidence |
none | F9 |
EvidenceHashing |
HashBytes(data, label) |
Byte() + label |
HashEvidence |
none | F9 |
EvidenceHashing |
HashBytes(data, label, options) |
Byte() + label + options |
HashEvidence |
none | F9 |
EvidenceHashing |
HashEntries(entries) |
IReadOnlyList(Of ZipExtractedEntry) |
HashEvidence |
none | F9 |
EvidenceHashing |
HashEntries(entries, label) |
entries + label | HashEvidence |
none | F9 |
EvidenceHashing |
HashEntries(entries, label, options) |
entries + label + options | HashEvidence |
none | F9 |
EvidenceHashing |
VerifyRoundTrip(path) |
file path | HashRoundTripReport |
writes temp file internally and cleans up | F9 |
EvidenceHashing |
VerifyRoundTrip(path, options) |
file path + options | HashRoundTripReport |
writes temp file internally and cleans up | F9 |
FileTypeOptions |
LoadOptions(json) |
JSON | Boolean |
changes global options | F7 |
FileTypeOptions |
GetOptions() |
- | String (JSON) |
none | F7 |
FileTypeProjectBaseline |
ApplyDeterministicDefaults() |
- | Void |
changes global options | F7 |
Details: FileClassifier module, Detection module, Infrastructure module.
Important semantic note: TryValidateArchive(path) validates generic archive containers via the same fail-closed pipeline as extraction and materialization.
flowchart TD
I0["Input: Path or Bytes"]
D1["Detect / DetectDetailed"]
D2["DetectAndVerifyExtension"]
V1["TryValidateArchive"]
X1["ExtractArchiveSafe / ExtractArchiveSafeToMemory"]
O0["Output: Type / Detail / Bool / Entries"]
I0 --> D1 --> O0
I0 --> D2 --> O0
I0 --> V1 --> O0
I0 --> X1 --> O0
using Tomtastisch.FileClassifier;
var detector = new FileTypeDetector();
var t = detector.Detect("/data/invoice.pdf", verifyExtension: true);
var d = detector.DetectDetailed("/data/archive.docx", verifyExtension: true);
bool archiveOk = FileTypeDetector.TryValidateArchive("/data/archive.zip");
var entries = detector.ExtractArchiveSafeToMemory("/data/archive.zip", verifyBeforeExtract: true);
Console.WriteLine($"{t.Kind} / {d.ReasonCode} / {archiveOk} / {entries.Count}");The API name is kept for compatibility reasons; internally, archive containers are handled uniformly. Details: FileClassifier module, Infrastructure module.
flowchart LR
ZI["Input: Path or Bytes"]
ZV["TryValidate(path|data)"]
ZE["ExtractToMemory / TryExtractToMemory"]
ZO["Output: Bool or Entries"]
ZI --> ZV --> ZO
ZI --> ZE --> ZO
using Tomtastisch.FileClassifier;
bool okPath = ArchiveProcessing.TryValidate("/data/archive.zip");
bool okBytes = ArchiveProcessing.TryValidate(File.ReadAllBytes("/data/archive.zip"));
var entriesPath = ArchiveProcessing.ExtractToMemory("/data/archive.zip", verifyBeforeExtract: true);
var entriesBytes = ArchiveProcessing.TryExtractToMemory(File.ReadAllBytes("/data/archive.zip"));Details: FileClassifier module, Infrastructure module.
flowchart TD
MI["Input: Byte Payload + destinationPath"]
MP["Persist(...)"]
BR["secureExtract and archive?"]
RW["Raw Byte Write"]
ZE["Validated Archive Extract to Disk"]
MO["Output: Bool"]
MI --> MP --> BR
BR -->|"No"| RW --> MO
BR -->|"Yes"| ZE --> MO
using Tomtastisch.FileClassifier;
byte[] payload = File.ReadAllBytes("/data/input.bin");
byte[] archivePayload = File.ReadAllBytes("/data/archive.zip");
bool rawOk = FileMaterializer.Persist(payload, "/data/out/input.bin", overwrite: false, secureExtract: false);
bool archiveExtractOk = FileMaterializer.Persist(archivePayload, "/data/out/unpacked", overwrite: false, secureExtract: true);Details: FileClassifier module, Configuration module.
flowchart LR
C0["Config Input: JSON or Baseline Call"]
C1["LoadOptions(json)"]
C2["ApplyDeterministicDefaults()"]
C3["Global Snapshot"]
C4["GetOptions()"]
C0 --> C1 --> C3
C0 --> C2 --> C3
C3 --> C4
using Tomtastisch.FileClassifier;
FileTypeProjectBaseline.ApplyDeterministicDefaults();
bool loaded = FileTypeOptions.LoadOptions("{\"maxBytes\":134217728}");
string snapshot = FileTypeOptions.GetOptions();
Console.WriteLine($"Loaded={loaded}; Snapshot={snapshot}");Details: FileClassifier module, Abstractions module. Formal API contract: 04 - EvidenceHashing API Contract.
flowchart LR
HI["Input: File/Bytes/Entries"]
H1["Physical SHA-256 (raw bytes)"]
H2["Logical SHA-256 (canonical content)"]
H3["optional Fast XxHash3"]
HR["RoundTrip h1-h4 report"]
HI --> H1
HI --> H2
HI --> H3
HI --> HR
using Tomtastisch.FileClassifier;
var evidence = EvidenceHashing.HashFile("/data/archive.zip");
var report = EvidenceHashing.VerifyRoundTrip("/data/archive.zip");
Console.WriteLine($"{evidence.Digests.PhysicalSha256} / {evidence.Digests.LogicalSha256}");
Console.WriteLine($"LogicalConsistent={report.LogicalConsistent}");Note:
PhysicalSha256andLogicalSha256are Security SSOT.Fast*XxHash3is an optional performance digest and not a cryptographic integrity proof.- Overloads without
optionsuse the global policyFileTypeOptions.GetSnapshot().DeterministicHash.
- No detailed discussion of internal low-level implementation (see
docs/references/101_REFERENCES_CORE.MD). - No norm/compliance derivation (see
docs/specs/101_SPEC_DIN.MD).
The following rules apply to ArchiveSafetyGate + ArchiveExtractor:
| Rule | Default | Contract |
|---|---|---|
Link entries (symlink/hardlink) |
RejectArchiveLinks = true |
Link targets are rejected fail-closed. Override only via explicit opt-in (false) and the consumer's own risk decision. |
| Unknown entry size | AllowUnknownArchiveEntrySize = false |
"Unknown" means Size is missing or negative. In that case validation is fail-closed blocked or measured via bounded streaming; if limits are violated -> False. |
| Path safety | enabled | Entry name is normalized (\\ -> /), root/traversal/empty are rejected, destination path must pass a prefix check. |
| Limits | enabled | Entry count, per-entry bytes, total bytes, recursion depth, and archive-specific ratio/nested rules remain fail-closed. |
"Implemented" means: the archive format is opened, the gate is applied, and extraction is executed through the same pipeline.
| Format | Detection (Detect) |
Validate (TryValidateArchive / ArchiveProcessing.TryValidate) |
Extract-to-memory (ArchiveProcessing.TryExtractToMemory) |
Extract-to-disk (FileMaterializer.Persist(..., secureExtract:=True)) |
|---|---|---|---|---|
| ZIP | Yes (magic + gate + optional OOXML refinement) | Yes | Yes | Yes |
| TAR | Yes (container detection + gate, logical FileKind.Zip) |
Yes | Yes | Yes |
| TAR.GZ | Yes (GZIP container, nested archive path) | Yes | Yes | Yes |
| 7z | Yes (container detection + gate, logical FileKind.Zip) |
Yes | Yes | Yes |
| RAR | Yes (container detection + gate, logical FileKind.Zip) |
Yes | Yes | Yes |
Note on type output: for generic archives the public return kind stays compatible (FileKind.Zip); ZIP-based OOXML containers can be refined to Docx/Xlsx/Pptx.
The byte paths (Detect/Validate/Extract/Persist) are wired to the same gate pipeline.
| Format | Detection | Validate | Extract memory | Extract disk | Test evidence |
|---|---|---|---|---|---|
| ZIP | covered | covered | covered | covered | tests/FileTypeDetectionLib.Tests/Unit/ArchiveProcessingFacadeUnitTests.cs, tests/FileTypeDetectionLib.Tests/Unit/ArchiveExtractionUnitTests.cs, tests/FileTypeDetectionLib.Tests/Unit/FileMaterializerUnitTests.cs |
| TAR | covered | indirectly via unified validate | implicitly via backend path | implicitly via backend path | tests/FileTypeDetectionLib.Tests/Unit/UnifiedArchiveBackendUnitTests.cs |
| TAR.GZ | covered | covered | covered | covered | tests/FileTypeDetectionLib.Tests/Unit/UnifiedArchiveBackendUnitTests.cs |
| 7z | covered | covered | covered | covered | tests/FileTypeDetectionLib.Tests/Features/FTD_BDD_040_ARCHIVE_TYPEN_BYTEARRAY_UND_MATERIALISIERUNG.feature |
| RAR | covered | covered | covered | covered | tests/FileTypeDetectionLib.Tests/Features/FTD_BDD_040_ARCHIVE_TYPEN_BYTEARRAY_UND_MATERIALISIERUNG.feature |
Additional security evidence:
- Link entry fail-closed (
RejectArchiveLinks=true):tests/FileTypeDetectionLib.Tests/Unit/UnifiedArchiveBackendUnitTests.cs. - Archive fail-closed regressions (traversal, malformed header, overwrite/target guards):
tests/FileTypeDetectionLib.Tests/Unit/FileMaterializerUnitTests.cs.
Portability/structure:
- Public API remains stable; internal responsibilities are separated (
DetectionSSOT,Infrastructuregate/backend/extractor,Configurationpolicies,Abstractionsreturn models). - Reuse happens via a neutral entry model and unified gate/extractor paths.
- Content verified against current code state.
- Links and anchors checked with
python3 tools/check-docs.py. - Examples/commands verified locally.
- Terminology aligned with
docs/110_API_CORE.MD.