Skip to content

[AI] AI inference subsystem with ONNX Runtime backend#20322

Open
andriiryzhkov wants to merge 13 commits intodarktable-org:masterfrom
andriiryzhkov:object_ai_mask
Open

[AI] AI inference subsystem with ONNX Runtime backend#20322
andriiryzhkov wants to merge 13 commits intodarktable-org:masterfrom
andriiryzhkov:object_ai_mask

Conversation

@andriiryzhkov
Copy link
Contributor

@andriiryzhkov andriiryzhkov commented Feb 11, 2026

This PR introduces an AI subsystem into darktable with two features built on top of it:

  1. AI Object Mask — a new mask tool that lets users select objects in the image by clicking on them. It uses the Light HQ-SAM model to segment objects, then automatically vectorizes the result into path masks (using ras2vect) that integrate with darktable's existing mask system.

  2. AI Denoise — a denoising module powered by the NAFNet model. This was initially developed as a simpler test case for the AI subsystem and is included here as a bonus feature.

Both models are converted to ONNX format for inference. Conversion scripts live in a separate repository: https://github.com/andriiryzhkov/darktable-ai. Models are not bundled with darktable — they are downloaded from GitHub Releases after the app is installed, with SHA256 verification. A new dependency on libarchive is added to handle extracting the downloaded model archives.

AI subsystem design

The AI subsystem is currently built on top of ONNX Runtime, though the backend is abstracted to allow adding other inference engines in the future. ONNX Runtime is used from pre-built packages distributed on GitHub. On Windows, ONNX Runtime is built with MSVC, so using pre-built binaries is the natural approach for us — I initially expected this to be a problem, but discovered this is common practice among other open-source projects and works well.

The system is organized in three layers:

  1. Backend (src/ai/): Wraps ONNX Runtime C API behind opaque handles. Handles session creation, tensor I/O, float16 conversion, and hardware acceleration provider selection (CoreML, CUDA, ROCm, DirectML). Providers are enabled via runtime dynamic symbol lookup rather than compile-time linking, so there are no build dependencies on vendor-specific libraries. A separate segmentation.c implements the SAM two-stage encoder/decoder pipeline with embedding caching and iterative mask refinement.

  2. Model management (src/common/ai_models.c): Registry that tracks available models, their download status, and user preferences. Downloads model packages from GitHub Releases with SHA256 verification, path traversal protection, and version-aware tag matching. Uses libarchive for safe extraction with symlink and dotdot protections. Thread-safe — all public getters return struct copies, not pointers into the registry.

  3. UI and modules: The object mask tool (src/develop/masks/object.c) runs SAM encoding in a background thread to keep the UI responsive. The user sees a "working..." overlay during encoding, then clicks to place foreground/background prompts. Right-click finalizes by vectorizing the raster mask into Bézier path forms. AI denoise module (src/libs/denoise_ai.c) and preferences tab (src/gui/preferences_ai.c) provide the remaining user-facing features.

Fixes: #12295, #19078, #19310

@andriiryzhkov andriiryzhkov mentioned this pull request Feb 11, 2026
@TurboGit
Copy link
Member

Models are not bundled with darktable —

Perfect!

they are downloaded from GitHub Releases after the app is installed, with SHA256 verification. A new dependency on libarchive is added to handle extracting the downloaded model archives.

Can this be simplified (no SHA256) for now to allow testers to download the models using current master?

@andriiryzhkov
Copy link
Contributor Author

andriiryzhkov commented Feb 11, 2026

For testing purposes, you can skip the download mechanism entirely and just manually place model files in ~/.local/share/darktable/models/. Each model is a directory with a config.json and ONNX files — the AI backend scans that path at startup.

If placed manually, there are no SHA256 checks.

@victoryforce
Copy link
Collaborator

@andriiryzhkov Thank you for such a great contribution!

For macOS CI to complete successfully, libarchive should be added to .ci/Brewfile.

@andriiryzhkov
Copy link
Contributor Author

@victoryforce thank you for the advise! Done.

@zisoft
Copy link
Collaborator

zisoft commented Feb 11, 2026

For macOS CI to complete successfully, libarchive should be added to .ci/Brewfile.

and please add it to the list in packaging/macosx/1_install_hb_dependencies.sh as well.

@andriiryzhkov
Copy link
Contributor Author

and please add it to the list in packaging/macosx/1_install_hb_dependencies.sh as well.

Already there.

@zisoft
Copy link
Collaborator

zisoft commented Feb 11, 2026

it seems that onnxruntime is needed as well

@andriiryzhkov
Copy link
Contributor Author

it seems that onnxruntime is needed as well

ONNX Runtime download from GitHub and installation is already embedded into cmake build process. Locally it builds and run successfully. I am investigating the problem with CI run.

@andriiryzhkov andriiryzhkov mentioned this pull request Feb 11, 2026
@TurboGit
Copy link
Member

@andriiryzhkov
I just did a quick test and I must say that I find the model quite off in many cases. Maybe because of the use of Light-HQ-SAM instead of HQ-SAM? Is it possible to use HQ-SAM, what the trade-off? Size of the model? Speed?

Side question, have you contacted @MikoMikarro who was working on such support too?

@paperdigits
Copy link
Collaborator

Speaking as a package maintainer for both the flatpak and Nix package, this "download at build time" stuff is not good. It won't work for either package that I maintain, and it'll make the build fail in both cases. I'd suggest we not do that.

@paperdigits
Copy link
Collaborator

Also does including this align with our morals and ethics?

@TurboGit
Copy link
Member

Speaking as a package maintainer for both the flatpak and Nix package, this "download at build time" stuff is not good. It won't work for either package that I maintain, and it'll make the build fail in both cases. I'd suggest we not do that.

I don't see a download at build time in this PR. The models are downloaded by users if they want to use the AI features.

@paperdigits
Copy link
Collaborator

@TurboGit what is this then? https://github.com/darktable-org/darktable/pull/20322/changes#diff-df66e84cd5b7c2f6a163dfb4507e92c7ba5b642aef4cb931bf5226573d77b3efR63

@TurboGit
Copy link
Member

Indeed, I was speaking of the models, but the runtime is download in CMake. We probably want to integrate this as submodule in darktable then.

@andriiryzhkov
Copy link
Contributor Author

andriiryzhkov commented Feb 11, 2026

I just did a quick test and I must say that I find the model quite off in many cases. Maybe because of the use of Light-HQ-SAM instead of HQ-SAM? Is it possible to use HQ-SAM, what the trade-off? Size of the model? Speed?

It depends what do you mean by "quire off". Yes, it may not correctly pick up object from one click. I just make another "+" click on the object which I want to select. If there's something I need to remove, with shift-click "-" I exclude it from the mask. So, it is not quite one-click process, but rather interactive one.

But I agree we can experiment and test different versions of the model. I was also thinking about HQ-SAM B model, but speed of image encoding will be significantly reduced. Model file will also be larger, but I wouldn't worry about that. Anyway, it is worth to try. I think it will not require changes in AI engine code. Just update the model.

Side question, have you contacted @MikoMikarro who was working on such support too?

We had some conversation last year on early stages.

Speaking as a package maintainer for both the flatpak and Nix package, this "download at build time" stuff is not good. It won't work for either package that I maintain, and it'll make the build fail in both cases. I'd suggest we not do that.

I believe you are talking about ONNX Runtime. On Linux in most distros there is a package libonnxruntime-dev which we can use as dependancy. On macOS we can use brew onnxruntime, but brew version does not contain CoreML optimization provider. On Windows we should keep downloading from GitHub as there is no ONNX package in MSYS2.

Regarding the models, it is also possible to package them if we want.

@andriiryzhkov
Copy link
Contributor Author

Integrating ONNX Runtime could be a challenge on Windows, as it requires MSVC to build.

@scorpi11
Copy link

scorpi11 commented Feb 11, 2026

Can you provide some details about the data model(s) used in this PR and what training data was used for it/them?

From my point of view, darktable as an open source tool for photographers should make sure that its components are from a valid source with a clear legal origin.

@andriiryzhkov
Copy link
Contributor Author

Can you give some details about the data model(s) used in this PR and what training data was used for it/them?

I did not train those models by myself. Just use pre-build model weights. If you are interested learning about the training data, I advise you to check origial repositories of the models:

@paperdigits
Copy link
Collaborator

I'd hazard a semi-educated guess that the training data from facebook does not match our moral or ethical standard.

@andriiryzhkov
Copy link
Contributor Author

From my point of view, darktable as an open source tool for photographers should make sure that it's components are from a valid source with a clear legal origin.

Absolutely agree with you. Both models are from well known authors in the Computer Vision field and supported by academic publications. SAM models are widely used in photography industry these days.
I welcome everybody to review the sources.

@paperdigits
Copy link
Collaborator

SAM models are widely used in photography industry these days.

This means absolutely nothing in the context of this conversation.

I welcome everybody to review the sources.

We're asking if you've reviewed them, and if yes, do you think they meet the ethical and moral standards of this project?

@TurboGit
Copy link
Member

This feature is/will not be enabled by default. User need to download the models. If you don't agree with the way the models are trained... don't download them, don't use the AI feature in Darktable. As simple as that.

@andriiryzhkov
Copy link
Contributor Author

We're asking if you've reviewed them, and if yes, do you think they meet the ethical and moral standards of this project?

Yes, I reviewed both models before selecting them.

Light HQ-SAM (AI mask): Based on Meta's Segment Anything, released under Apache-2.0 by SysCV (ETH Zurich). The HQ-SAM extension was published at NeurIPS 2023. Training uses SA-1B dataset (publicly released by Meta under Apache-2.0) plus standard segmentation benchmarks. No non-commercial restrictions.

NAFNet (AI denoise): Released under MIT license by Megvii Research. Published at ECCV 2022. Trained on standard academic datasets (SIDD, GoPro, REDS) that are publicly available for research. No restrictive terms on the weights.

Both are from peer-reviewed research with clearly documented training data and permissive licenses compatible with GPL-3. I see no ethical concerns — no undisclosed web scraping, no personal data in training sets, no non-commercial clauses.

@paperdigits
Copy link
Collaborator

Its been made clear to me that my comments are not welcome here and that the project is "bigger than me." I'll show myself out.

@TurboGit
Copy link
Member

I was also thinking about HQ-SAM B model, but speed of image encoding will be significantly reduced.

If by any mean you have such model I'll test it to see if the object are better recognized.

@jenshannoschwalm
Copy link
Collaborator

This feature is/will not be enabled by default. User need to download the models. If you don't agree with the way the models are trained... don't download them, don't use the AI feature in Darktable. As simple as that.

A very clear position. For me the important points would be:

  1. User has to download the models by some intentional trigger/button or whatever that UI action might be
  2. We should assist doing so, maybe even offering selections
  3. But we should never ship/distribute/keep such models within dt

@wpferguson
Copy link
Member

On the ethics question I'm going to use Lua as an example and the rules that were given to me when I started maintaining the repo

I wrote a script that exports an image and sends it to GIMP as an external editor and then returns the result.

If someone wrote a similar script that calls photoshop to do the same thing and then wanted it merged into the Lua repository the answer would be no, because photoshop is not FOSS.

However, if someone wrote a script that let you specify whatever external editor you wanted to use (i.e. ext_editors.lua) that would (did) get merged. That way Lua is not promoting the use of commercial software, but it's not prohibiting it either.

So if we build modules (denoise, masking, etc) that use AI models they shouldn't require specific models. If we require specific models then it's like writing a script that requires photoshop. If we have an AI module (denoise) that can use different models (NAFnet, NIND, etc) supplied by the user then we have something similar to the ext_editors script.

@andriiryzhkov
Copy link
Contributor Author

So if we build modules (denoise, masking, etc) that use AI models they shouldn't require specific models. If we require specific models then it's like writing a script that requires photoshop. If we have an AI module (denoise) that can use different models (NAFnet, NIND, etc) supplied by the user then we have something similar to the ext_editors script.

That's a great analogy and I agree with the principle. This is actually how the implementation works:

The AI backend (src/ai/) is a generic ONNX Runtime wrapper — it can run any ONNX model. The model registry is a JSON-based catalog that describes available models with metadata (task type, model files). Note that ONNX models are pure neural networks with no pre/post-processing baked in — all image preparation (colorspace conversion, normalization, tiling) is handled by darktable's code, not by model metadata.

For model portability, the key constraint is the input/output tensor dimensions that darktable expects for a specific task:

  • Denoise (NAFNet): Input [1, 3, H, W] float32 (sRGB image tile), output [1, 3, H, W] float32 (denoised tile of the same size). Any model that takes a noisy RGB tile and returns a denoised RGB tile of identical dimensions will work as a drop-in replacement.

  • Segmentation (Light HQ-SAM): Two-stage encoder/decoder architecture. Encoder takes [1, 3, 1024, 1024] and produces image embeddings [1, 256, 64, 64] + intermediate features [1, 1, 64, 64, 160]. Decoder takes 7 inputs (embeddings, point coordinates [1, N, 2], point labels [1, N], previous mask [1, 1, 256, 256], etc.) and outputs a mask [1, 1, H, W] + IoU score + low-res mask. Any SAM-compatible encoder/decoder pair with these tensor shapes will work.

As long as a model follows these tensor dimension requirements for a given task, it will work. Users can already install models manually into the models directory (~/.local/share/darktable/models/<model-id>/) and select which model to use for a specific module via darktable configuration. As a next step, we can reduce the requirement to register models in ai_models.json — models placed in the models folder would be auto-discovered at startup, while the registry would only serve as a catalog of models available for download.

@andriiryzhkov
Copy link
Contributor Author

I split PR into three pieces:

@wpferguson
Copy link
Member

wpferguson commented Feb 22, 2026

It would be great to see the image you were experimenting with

Sunflower - https://discuss.pixls.us/t/darktable-filmic-v5-vs-v6-sunflower-challenge/32503

Monkey - https://discuss.pixls.us/t/a-raw-denoise-cross-comparison/48666

I'm a firm thumb sideways :)

@ralfbrown, me too!

@andriiryzhkov - thanks for all of your efforts and for putting up with us while we try and figure this out.

@dtorop
Copy link
Contributor

dtorop commented Feb 22, 2026

@andriiryzhkov: Likewise I appreciate your being responsive RE splitting PR and alternative models concerns.

Would it make sense for an initial pull request to be yet more conservative, and have no option for either downloading ONNX Runtime or any models? That way users who felt OK with, say, apt-get install libonnxruntime-dev then downloading a model by hand could try this code. And even more conservative if the new AI code is only compiled when the user enables a BUILD_AI compile flag, which defaults to off.

Then cmake downloading at build time (via cmake/modules/FindONNXRuntime.cmake) or prefs downloading model blobs (via src/common/ai_models.c) would be later PRs. It's already the norm for Linux users compiling dt from source to be the one to install optional libraries. I'm not sure about MacOS or Windows.

I know in one sense this breaking up of the PR is conservative, and aids code review. On the other hand, it does put off the moment when there is actually code that does something observable. And it becomes a bit of a boiling-the-frog project, where if initial PRs make their way in, just one or two more might seem fine too...

Here am in the thumbs-sideways place as well...

@piratenpanda
Copy link
Contributor

piratenpanda commented Feb 22, 2026

For denoising it could also be helpful to involve @rymuelle from https://github.com/rymuelle/RawRefinery as he created some models/datasets by himself if I understood correctly

@andriiryzhkov
Copy link
Contributor Author

@dtorop Thank you for the thoughtful feedback.

BUILD_AI currently defaults to ON, but if ONNX Runtime is not found at build time, it falls back gracefully — no AI code is compiled. Since ORT is not packaged by most distros yet, this effectively means AI stays off unless the builder explicitly provides it.

On the ONNX Runtime auto-download: I understand the concern. The auto-download in FindONNXRuntime.cmake only triggers when no system package is found — so apt-get install libonnxruntime-dev works naturally as the primary path. There is also an ONNXRUNTIME_OFFLINE flag that disables auto-download entirely. The auto-download exists because on macOS the Homebrew package lacks CoreML acceleration, which makes a significant performance difference, and on Windows there is no system package path at all.

On model downloads: I've already added a BUILD_AI_DOWNLOAD flag (default ON) that removes all network download code when set to OFF. With downloads disabled, users can still install models manually via an "install model" button in preferences that accepts .dtmodel files. This addresses the distro concern — packagers can ship with downloads disabled and let users install models from files.

As an option, we could even start with BUILD_AI_DOWNLOAD=OFF as the default — no repository downloads out of the box, manual model installation only. That would give the most conservative starting point while still allowing people to test and use the features.

@TurboGit
Copy link
Member

On the ONNX Runtime auto-download: I understand the concern. The auto-download in FindONNXRuntime.cmake only triggers when no system package is found

Because I may have forgotten the reason, can you remind me when this is needed? Currently on darktable we have a simple way of handling optional dependencies:

  1. Check if needed OS package is present.
  2. Activate the corresponding feature is OS package found and disable it otherwise.

If the above rule can work for GNU/Linux, Windows & macOS then we can probably remove the auto-download. This optional dependency will be documented into the README.md & in RELEASE_NOTES.md for 5.6.

@andriiryzhkov
Copy link
Contributor Author

andriiryzhkov commented Feb 22, 2026

@TurboGit Let me remind the logic behind that decision, and maybe we can think how to improve this part.

If the above rule can work for GNU/Linux, Windows & macOS then we can probably remove the auto-download.

Let me break it down per platform:

  • Linux — This is exactly how it works now. If libonnxruntime-dev (or equivalent) is installed, it's picked up automatically. The auto-download fallback exists but can be disabled with ONNXRUNTIME_OFFLINE=ON.

  • macOS — Ideally I'd rely on the Homebrew onnxruntime package, and the build system does find it. Unfortunately, the Homebrew package does not include the CoreML acceleration provider, which makes a significant performance difference on Apple Silicon. That's why the auto-download exists for macOS — it fetches an ORT build with CoreML support.

  • Windows — MSYS2 does not have an ONNX Runtime package, and cannot have one because ORT cannot be built with GCC — it requires MSVC. The only approach on Windows is to provide a pre-built binary. The standard GitHub release binaries for Windows do not include DirectML acceleration. The alternative is the NuGet package, which does include DirectML support.

So this dependency is a bit harder to manage than a typical optional library — the system packages either don't exist or are missing the acceleration providers that make AI features practical.

@naorunaoru
Copy link

I'm training a set of models for NN demosaicing for Fuji X-Trans cameras: https://github.com/naorunaoru/x-veon
You can check the comparison here: https://naorunaoru.github.io/x-veon/comparison.html

The model already runs with ONNX Runtime backend.

Question is, does your inference subsystem allow for NN demosaicing?

I'd be obscenely happy to see my thing integrated with darktable.

@andriiryzhkov
Copy link
Contributor Author

@naorunaoru

Question is, does your inference subsystem allow for NN demosaicing?

Proposed AI subsystem can run inference of any model bundled in a single ONNX format file. You just need to take care of the code that will consume this model in a module, library or script.

@TurboGit
Copy link
Member

@TurboGit Let me remind the logic behind that decision, and maybe we can think how to improve this part.

Ok, clear now. I don't have further comment on this as I don't see a better solution than the one you are providing on your PR. Thanks for the quick reply.

@TurboGit TurboGit added this to the 5.6 milestone Feb 22, 2026
@TurboGit TurboGit added priority: low core features work as expected, only secondary/optional features don't feature: new new features to add difficulty: hard big changes across different parts of the code base scope: codebase making darktable source code easier to manage release notes: pending labels Feb 22, 2026
@naorunaoru
Copy link

Ok, so I gave the PR a cursory glance and it seems that it's not yet possible to run inference on raw CFA data with this approach.

The proposed denoise module is a lib that operates in sRGB space sidestepping the full pixel pipe, but demosaicing (and SoTA ML denoising for that matter) happens on raw linear data. In darktable's pipeline, demosaicing is handled by src/iop/demosaic.c, which runs extremely early in the pixel pipe (as an IOP module, not a lighttable lib).

Making it an IOP that replaces/augments demosaic.c in the develop module would be much more useful for demosaic/denoise but significantly more work since it requires conforming to darktable's IOP API, handling ROI (region of interest) requests, and integrating with the tiling infrastructure that IOPs already have.

@andriiryzhkov
Copy link
Contributor Author

@naorunaoru You will need your own code for data preparation for the specific task you are working on. Each module or library will handle it differently.

Maybe ay some point we will come up with some reusable data pre- and post-processing. But that's ideas for later.

@MikoMikarro
Copy link

I know it is quite a tangent. But I don't know if it would have been easier to add a connector to Comfy AI as they are doing at :: https://github.com/CyberTimon/RapidRAW

Just a thought

@hats-np
Copy link

hats-np commented Feb 23, 2026

ComfyUI support does not fit well in darktable imo. It's a software with a ton of updates, many of which are breaking changes updates. Lately, a lot of it is vibe coded, the plugin support is horrendous and often has severe security vulnerabilities, etc. I say this a ComfyUI user with a lot of hours playing with it under my belt. The current proposal seems the most sensible in my opinion.

ComfyUI is now also owned by VC, even if its owners are open supporters of FOSS, we all know what happens when VC money gets into something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

difficulty: hard big changes across different parts of the code base feature: new new features to add priority: low core features work as expected, only secondary/optional features don't release notes: pending scope: codebase making darktable source code easier to manage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI Masks