CollectiFOR's collect tool is used to collect and capture data from a target host. It uses YAML based configuration file where you can alternate what data is collected based on different needs.
Collect tool is python based program which is released as built binary.
Tip
If you can't run the binary or Python on a target machine(s) there's also an alternative option to generate collection bash scripts from YAML configurations. See: gen-collect-sh.
Generated bash scripts are quite rudimentary and lack some functionalities in comparison with the actual collect tool, but its output still works with CollectiFOR's analysis tools.
Tip
Check releases for prebuilt collect binary.
- Simple PyInstaller example:
pip3 install pyinstaller
pyinstaller --onefile --paths=. collect.py- Preferably use
build.shscript which provides better support for different Linux distributions. It uses https://github.com/pypa/manylinux and requires docker.
After building, ship ./dist/collect and config.yaml to target machine and run collection.
- Download
collectbinary. - Configure
config.yaml.sampleto match your needs. - Run
collectwith the required options based on your configuration. - Copy collection directory or tar.gz (output format depends on configuration)
Options:
usage: collect.py [-h] -c CONFIG [-if INTERFACES] [-dh DISK_HOST] [-d DISK]
CollectiFOR | quick triage collection
options:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
Path to the YAML configuration file
-if INTERFACES, --interfaces INTERFACES
Interfaces for capture module. Multiple interfaces can be seperated with comma
-dh DISK_HOST, --disk-host DISK_HOST
Target host for disk capture. <str>@<str> assumes live capture over ssh. "localhost", "127.0.0.1", or "" assumes local disk.
-d DISK, --disk DISK Disk to capture. E.g. /dev/sda
Example:
sudo ./collect -c config.yaml.sample -if eth0,eth1Check the collection output path from the last log message. For example:
2025-12-07 22:04:07,548 [INFO] Collection finished: /tmp/out/hostname_20251207_220252.tar.gz
If compress_collection is set to false in config.yaml then the collection's result path is directory instead of tar.gz file.
Copy collection to analysis machine and continue with analysis.
Example Output Directory Structure:
/tmp/out/<timestamp>/
├── capture/
│ ├── eth0.pcap
│ └── eth0.pcap.txt
├── checksums/
│ ├── md5.txt
│ ├── sha1.txt
│ └── sha256.txt
├── commands/
│ ├── stdout.ps.txt
│ └── stdout.ls.txt
├── file_permissions.txt
└── files_and_dirs/
Before actual usage you need to configure the configuration file.
Check config.yaml.sample for full example. There are two main module structures, collect and capture. In the configuration file those appear like this:
modules:
capture:
collect:
Under those each module is enabled or disabled with keys like enable_<module_name>. For example: enable_commands: true which would enable collect module commands. In the configuration file this would appear like this:
modules:
...
collect:
enable_commands: true
In addition to this all modules might have additional configurations under keys (collect|capture).<module name>.<configuration>.
#Collect module: commands
Commands module collects outputs (stdout/stderr) for the commands specified in the configuration file.
modules:
...
collect:
enable_commands: true
commands:
list:
- ps auxwwwef
- aa-status
Set commands to execute under the list key. In collection the outputs are stored under commands directory like this:
commands/
total 1288
-rw-r----- 1 root root 3848 Dec 15 13:57 stdout.apt-cache.txt
-rw-r----- 1 root root 18263 Dec 15 13:57 stdout.auditctl.txt
-rw-r----- 1 root root 860 Dec 15 13:57 stdout.aureport.txt
Stdout of the executed command is stored under stdout.<command>.txt and stderr under stderr.<command>.txt. If the same command is executed with different options all of those outputs are stored in the same file.
Each command is seperated with line #command: <full command>. Here's an example from stdout.docker.txt:
#command:docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
...
#command:docker ps -a | awk '{print $1}'|grep -v CONTAINER |xargs -n1 docker inspect
[
{
"Id": "ef6cc07
...
#Collect module: luks
The luks module is basically a helper module to run cryptsetup luksDump <luks device>. It identifies which devices are potential luks devices before running cryptsetup. It stored the output according to the logic of the commands module.
$ cat commands/stdout.cryptsetup.txt
#command:cryptsetup luksDump /dev/nvme0n1p3
LUKS header information
Version: 2
...
#Collect module: checksums
Checksums module collects checksums for the file paths specified in the configuration file.
modules:
...
collect:
enable_checksums: true
checksums:
list:
- /etc/
- /path/to/some/file.txt
Directories are always recursive. Md5, sha1, and sha256 outputs are stored in seperate text files inside the collection:
# ls checksums/
md5.txt sha1.txt sha256.txt
Note that CollectiFOR's analysis tools has option to populate checksums from files copied to files_and_dirs (see below). Based on what you want it's possible to use both types to populate checksums.
For example, you may want to have checksums for some files without copying the actual file. Then this module is ideal for that. If you are copying all the relevant paths with files_and_dirs it might be enough to calculate checksums in analysis phases and disable this module.
#Collect module: files_and_dirs
Files_and_dirs module collects copies of file structures for the file paths specified in the configuration file.
modules:
...
collect:
enable_files_and_dirs: true
files_and_dirs:
list:
- /etc/
The original directory tree is stored. Directories are always recursive.
Meaning that when, for example, /etc/passwd is copied by specifying it directly or just /etc/ it will appear like this inside the collection:
files_and_dirs/etc/passwd
#Collect module: processes
Processes module collects information about processes including network listening information. Commands module can be used to collect all the same information, but this quickly collects the data inside a single file.
Collected details are stored in file processes.json inside the collection.
cat processes.json
[
{
"pid": 1,
"ppid": 0,
"process": "systemd",
"exec": "/usr/lib/systemd/systemd",
"cmdline": "/sbin/init splash",
"systemd": "",
"related_paths": [
"/sbin/init",
"/usr/lib/systemd/systemd"
],
"network": {
"tcp": [],
"udp": []
}
},
#Collect module: file_permissions
File_permissions module collects file permissions of the file paths specified in the configuration file.
modules:
...
collect:
enable_file_permissions: true
file_permissions:
list:
- /etc/
Directories are always recursive. Results are stored in file file_permissions.txt inside the collection.
head -n 1 file_permissions.txt
/etc/gshadow- 640 -rw-r----- root:shadow 1172 1764100568.0
# Capture module: disk
Capture disk image from a locally attached disk (DD/E01) or remotely via SSH (DD).
capture:
enable_disk: true
disk:
capture_method: dd|e01
...- SSH:
Only capture method "dd" supported.
./dist/collect -c config.yaml -d "/dev/vda" -dh "user@ip"
Note that with remote ssh usage the module requires ssh key authentication, root login or passwordless sudo, and it skips host key verification.
- Local:
Capture methods "dd" or "e01" are supported.
./dist/collect -c config.yaml -d "/dev/sda" -dh "localhost"
Module requires pv on the local system and dd on remote system when capture method is "dd". With "e01" ewfacquire is required.
# Capture module: network
Capture network traffic for the given time perioid. Uses scapy module for the capture and capture interfaces are specified with command line argument -if / --interfaces <if1,if2,if3>.
capture:
enable_network: true
# seconds
network:
timeout: 60
...Pcap and extracted text file version are stored under "capture" directory inside the collection. Simple text extraction is mainly done due to quicker pattern based analysis e.g. against IP-based IoCs.
ls capture/eth0.pcap*
capture/eth0.pcap capture/eth0.pcap.txt
# Capture module: memory
Example to enable memory capture and use lime module found in path memory/lime-6.14.0-36-generic.ko
capture:
...
enable_memory: true
memory:
capture_method: lime
lime:
# Module not included
path: memory/lime-6.14.0-36-generic.ko
# Format: lime/raw
format: limeMemory capture is stored under "capture" directory inside the collection.
You can run any module in its own thread by specifying own_thread to true inside the module's config. Here's an example with the network module:
modules:
capture:
# Capture network traffic
enable_network: true
# seconds
network:
own_thread: true
The own_thread parameter defaults to false with all modules if not specified. Note that early KeyboardInterrupt is captured and threads are waited to finish.
With an early KeyboardInterrupt (ctrl+c), the incompleted collection is removed after all threads are finished.
Warning
The collect tool does not track in anyway which modules should or should not run in its own thread or which amount of threds is sensible configuration. Consider this when adding "own_thread" configurations.
