CollectiFOR | collect

CollectiFOR's collect tool is used to collect and capture data from a target host. It uses YAML based configuration file where you can alternate what data is collected based on different needs. Collect tool is python based program which is released as built binary.

Tip

If you can't run the binary or Python on a target machine(s) there's also an alternative option to generate collection bash scripts from YAML configurations. See: gen-collect-sh.

Generated bash scripts are quite rudimentary and lack some functionalities in comparison with the actual collect tool, but its output still works with CollectiFOR's analysis tools.

Building

Tip

Check releases for prebuilt collect binary.

Simple PyInstaller example:

pip3 install pyinstaller
pyinstaller --onefile --paths=. collect.py

Preferably use build.sh script which provides better support for different Linux distributions. It uses https://github.com/pypa/manylinux and requires docker.

After building, ship ./dist/collect and config.yaml to target machine and run collection.

Usage

Download collect binary.
Configure config.yaml.sample to match your needs.
Run collect with the required options based on your configuration.
Copy collection directory or tar.gz (output format depends on configuration)

Options:

usage: collect.py [-h] -c CONFIG [-if INTERFACES] [-dh DISK_HOST] [-d DISK]

CollectiFOR | quick triage collection

options:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        Path to the YAML configuration file
  -if INTERFACES, --interfaces INTERFACES
                        Interfaces for capture module. Multiple interfaces can be seperated with comma
  -dh DISK_HOST, --disk-host DISK_HOST
                        Target host for disk capture. <str>@<str> assumes live capture over ssh. "localhost", "127.0.0.1", or "" assumes local disk.
  -d DISK, --disk DISK  Disk to capture. E.g. /dev/sda

Example:

sudo ./collect -c config.yaml.sample -if eth0,eth1

Check the collection output path from the last log message. For example:

2025-12-07 22:04:07,548 [INFO] Collection finished: /tmp/out/hostname_20251207_220252.tar.gz

If compress_collection is set to false in config.yaml then the collection's result path is directory instead of tar.gz file.

Copy collection to analysis machine and continue with analysis.

Example Output Directory Structure:

/tmp/out/<timestamp>/
├── capture/
│   ├── eth0.pcap
│   └── eth0.pcap.txt
├── checksums/
│   ├── md5.txt
│   ├── sha1.txt
│   └── sha256.txt
├── commands/
│   ├── stdout.ps.txt
│   └── stdout.ls.txt
├── file_permissions.txt
└── files_and_dirs/

Before actual usage you need to configure the configuration file.

Modules

Check config.yaml.sample for full example. There are two main module structures, collect and capture. In the configuration file those appear like this:

modules:
  capture:
  collect:

Under those each module is enabled or disabled with keys like enable_<module_name>. For example: enable_commands: true which would enable collect module commands. In the configuration file this would appear like this:

modules:
  ...
  collect:
    enable_commands: true

In addition to this all modules might have additional configurations under keys (collect|capture).<module name>.<configuration>.

#Collect module: commands

Commands module collects outputs (stdout/stderr) for the commands specified in the configuration file.

modules:
  ...
  collect:
    enable_commands: true
    commands:
      list:
      - ps auxwwwef
      - aa-status

Set commands to execute under the list key. In collection the outputs are stored under commands directory like this:

commands/
total 1288
-rw-r----- 1 root root   3848 Dec 15 13:57 stdout.apt-cache.txt
-rw-r----- 1 root root  18263 Dec 15 13:57 stdout.auditctl.txt
-rw-r----- 1 root root    860 Dec 15 13:57 stdout.aureport.txt

Stdout of the executed command is stored under stdout.<command>.txt and stderr under stderr.<command>.txt. If the same command is executed with different options all of those outputs are stored in the same file. Each command is seperated with line #command: <full command>. Here's an example from stdout.docker.txt:

#command:docker ps -a
CONTAINER ID   IMAGE                             COMMAND                  CREATED        STATUS                      PORTS     NAMES
...
#command:docker ps -a | awk '{print $1}'|grep -v CONTAINER |xargs -n1 docker inspect
[
    {
        "Id": "ef6cc07
...

#Collect module: luks

The luks module is basically a helper module to run cryptsetup luksDump <luks device>. It identifies which devices are potential luks devices before running cryptsetup. It stored the output according to the logic of the commands module.

$ cat commands/stdout.cryptsetup.txt
#command:cryptsetup luksDump /dev/nvme0n1p3
LUKS header information
Version:       	2
...

#Collect module: checksums

Checksums module collects checksums for the file paths specified in the configuration file.

modules:
  ...
  collect:
    enable_checksums: true
    checksums:
      list:
      - /etc/
      - /path/to/some/file.txt

Directories are always recursive. Md5, sha1, and sha256 outputs are stored in seperate text files inside the collection:

# ls  checksums/
md5.txt  sha1.txt  sha256.txt

Note that CollectiFOR's analysis tools has option to populate checksums from files copied to files_and_dirs (see below). Based on what you want it's possible to use both types to populate checksums. For example, you may want to have checksums for some files without copying the actual file. Then this module is ideal for that. If you are copying all the relevant paths with files_and_dirs it might be enough to calculate checksums in analysis phases and disable this module.

#Collect module: files_and_dirs

Files_and_dirs module collects copies of file structures for the file paths specified in the configuration file.

modules:
  ...
  collect:
    enable_files_and_dirs: true
    files_and_dirs:
      list:
      - /etc/

The original directory tree is stored. Directories are always recursive. Meaning that when, for example, /etc/passwd is copied by specifying it directly or just /etc/ it will appear like this inside the collection:

files_and_dirs/etc/passwd

#Collect module: processes

Processes module collects information about processes including network listening information. Commands module can be used to collect all the same information, but this quickly collects the data inside a single file. Collected details are stored in file processes.json inside the collection.

cat processes.json 
[
  {
    "pid": 1,
    "ppid": 0,
    "process": "systemd",
    "exec": "/usr/lib/systemd/systemd",
    "cmdline": "/sbin/init splash",
    "systemd": "",
    "related_paths": [
      "/sbin/init",
      "/usr/lib/systemd/systemd"
    ],
    "network": {
      "tcp": [],
      "udp": []
    }
  },

#Collect module: file_permissions

File_permissions module collects file permissions of the file paths specified in the configuration file.

modules:
  ...
  collect:
    enable_file_permissions: true
    file_permissions:
      list:
      - /etc/

Directories are always recursive. Results are stored in file file_permissions.txt inside the collection.

head -n 1 file_permissions.txt 
/etc/gshadow- 640 -rw-r----- root:shadow 1172 1764100568.0

# Capture module: disk

Capture disk image from a locally attached disk (DD/E01) or remotely via SSH (DD).

  capture:
    enable_disk: true
    disk:
      capture_method: dd|e01
  ...

SSH:

Only capture method "dd" supported.

./dist/collect -c config.yaml -d "/dev/vda" -dh "user@ip"

Note that with remote ssh usage the module requires ssh key authentication, root login or passwordless sudo, and it skips host key verification.

Local:

Capture methods "dd" or "e01" are supported.

./dist/collect -c config.yaml -d "/dev/sda" -dh "localhost"

Module requires pv on the local system and dd on remote system when capture method is "dd". With "e01" ewfacquire is required.

# Capture module: network

Capture network traffic for the given time perioid. Uses scapy module for the capture and capture interfaces are specified with command line argument -if / --interfaces <if1,if2,if3>.

  capture:
    enable_network: true
    # seconds
    network:
      timeout: 60
  ...

Pcap and extracted text file version are stored under "capture" directory inside the collection. Simple text extraction is mainly done due to quicker pattern based analysis e.g. against IP-based IoCs.

ls capture/eth0.pcap*
capture/eth0.pcap  capture/eth0.pcap.txt

# Capture module: memory

Example to enable memory capture and use lime module found in path memory/lime-6.14.0-36-generic.ko

  capture:
  ...
    enable_memory: true
    memory:
      capture_method: lime
      lime:
        # Module not included
        path: memory/lime-6.14.0-36-generic.ko
        # Format: lime/raw
        format: lime

Memory capture is stored under "capture" directory inside the collection.

Running modules in own threads

You can run any module in its own thread by specifying own_thread to true inside the module's config. Here's an example with the network module:

modules:
  capture:
    # Capture network traffic
    enable_network: true
    # seconds
    network:
      own_thread: true

The own_thread parameter defaults to false with all modules if not specified. Note that early KeyboardInterrupt is captured and threads are waited to finish. With an early KeyboardInterrupt (ctrl+c), the incompleted collection is removed after all threads are finished.

Warning

The collect tool does not track in anyway which modules should or should not run in its own thread or which amount of threds is sensible configuration. Consider this when adding "own_thread" configurations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CollectiFOR | collect

Building

Usage

Modules

Running modules in own threads

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

CollectiFOR | collect

Building

Usage

Modules

Running modules in own threads