[AI] AI inference subsystem with ONNX Runtime backend#20322
[AI] AI inference subsystem with ONNX Runtime backend#20322andriiryzhkov wants to merge 13 commits intodarktable-org:masterfrom
Conversation
Perfect!
Can this be simplified (no SHA256) for now to allow testers to download the models using current master? |
|
For testing purposes, you can skip the download mechanism entirely and just manually place model files in If placed manually, there are no SHA256 checks. |
|
@andriiryzhkov Thank you for such a great contribution! For macOS CI to complete successfully, libarchive should be added to .ci/Brewfile. |
|
@victoryforce thank you for the advise! Done. |
and please add it to the list in |
Already there. |
|
it seems that |
ONNX Runtime download from GitHub and installation is already embedded into cmake build process. Locally it builds and run successfully. I am investigating the problem with CI run. |
|
@andriiryzhkov Side question, have you contacted @MikoMikarro who was working on such support too? |
|
Speaking as a package maintainer for both the flatpak and Nix package, this "download at build time" stuff is not good. It won't work for either package that I maintain, and it'll make the build fail in both cases. I'd suggest we not do that. |
|
Also does including this align with our morals and ethics? |
I don't see a download at build time in this PR. The models are downloaded by users if they want to use the AI features. |
|
Indeed, I was speaking of the models, but the runtime is download in CMake. We probably want to integrate this as submodule in darktable then. |
It depends what do you mean by "quire off". Yes, it may not correctly pick up object from one click. I just make another "+" click on the object which I want to select. If there's something I need to remove, with shift-click "-" I exclude it from the mask. So, it is not quite one-click process, but rather interactive one. But I agree we can experiment and test different versions of the model. I was also thinking about HQ-SAM B model, but speed of image encoding will be significantly reduced. Model file will also be larger, but I wouldn't worry about that. Anyway, it is worth to try. I think it will not require changes in AI engine code. Just update the model.
We had some conversation last year on early stages.
I believe you are talking about ONNX Runtime. On Linux in most distros there is a package Regarding the models, it is also possible to package them if we want. |
|
Integrating ONNX Runtime could be a challenge on Windows, as it requires MSVC to build. |
|
Can you provide some details about the data model(s) used in this PR and what training data was used for it/them? From my point of view, darktable as an open source tool for photographers should make sure that its components are from a valid source with a clear legal origin. |
I did not train those models by myself. Just use pre-build model weights. If you are interested learning about the training data, I advise you to check origial repositories of the models: |
|
I'd hazard a semi-educated guess that the training data from facebook does not match our moral or ethical standard. |
Absolutely agree with you. Both models are from well known authors in the Computer Vision field and supported by academic publications. SAM models are widely used in photography industry these days. |
This means absolutely nothing in the context of this conversation.
We're asking if you've reviewed them, and if yes, do you think they meet the ethical and moral standards of this project? |
|
This feature is/will not be enabled by default. User need to download the models. If you don't agree with the way the models are trained... don't download them, don't use the AI feature in Darktable. As simple as that. |
Yes, I reviewed both models before selecting them. Light HQ-SAM (AI mask): Based on Meta's Segment Anything, released under Apache-2.0 by SysCV (ETH Zurich). The HQ-SAM extension was published at NeurIPS 2023. Training uses SA-1B dataset (publicly released by Meta under Apache-2.0) plus standard segmentation benchmarks. No non-commercial restrictions. NAFNet (AI denoise): Released under MIT license by Megvii Research. Published at ECCV 2022. Trained on standard academic datasets (SIDD, GoPro, REDS) that are publicly available for research. No restrictive terms on the weights. Both are from peer-reviewed research with clearly documented training data and permissive licenses compatible with GPL-3. I see no ethical concerns — no undisclosed web scraping, no personal data in training sets, no non-commercial clauses. |
|
Its been made clear to me that my comments are not welcome here and that the project is "bigger than me." I'll show myself out. |
If by any mean you have such model I'll test it to see if the object are better recognized. |
A very clear position. For me the important points would be:
|
|
On the ethics question I'm going to use Lua as an example and the rules that were given to me when I started maintaining the repo I wrote a script that exports an image and sends it to GIMP as an external editor and then returns the result. If someone wrote a similar script that calls photoshop to do the same thing and then wanted it merged into the Lua repository the answer would be no, because photoshop is not FOSS. However, if someone wrote a script that let you specify whatever external editor you wanted to use (i.e. ext_editors.lua) that would (did) get merged. That way Lua is not promoting the use of commercial software, but it's not prohibiting it either. So if we build modules (denoise, masking, etc) that use AI models they shouldn't require specific models. If we require specific models then it's like writing a script that requires photoshop. If we have an AI module (denoise) that can use different models (NAFnet, NIND, etc) supplied by the user then we have something similar to the ext_editors script. |
That's a great analogy and I agree with the principle. This is actually how the implementation works: The AI backend ( For model portability, the key constraint is the input/output tensor dimensions that darktable expects for a specific task:
As long as a model follows these tensor dimension requirements for a given task, it will work. Users can already install models manually into the models directory ( |
|
I split PR into three pieces:
|
Sunflower - https://discuss.pixls.us/t/darktable-filmic-v5-vs-v6-sunflower-challenge/32503 Monkey - https://discuss.pixls.us/t/a-raw-denoise-cross-comparison/48666
@ralfbrown, me too! @andriiryzhkov - thanks for all of your efforts and for putting up with us while we try and figure this out. |
|
@andriiryzhkov: Likewise I appreciate your being responsive RE splitting PR and alternative models concerns. Would it make sense for an initial pull request to be yet more conservative, and have no option for either downloading ONNX Runtime or any models? That way users who felt OK with, say, Then cmake downloading at build time (via I know in one sense this breaking up of the PR is conservative, and aids code review. On the other hand, it does put off the moment when there is actually code that does something observable. And it becomes a bit of a boiling-the-frog project, where if initial PRs make their way in, just one or two more might seem fine too... Here am in the thumbs-sideways place as well... |
|
For denoising it could also be helpful to involve @rymuelle from https://github.com/rymuelle/RawRefinery as he created some models/datasets by himself if I understood correctly |
|
@dtorop Thank you for the thoughtful feedback.
On the ONNX Runtime auto-download: I understand the concern. The auto-download in On model downloads: I've already added a As an option, we could even start with |
Because I may have forgotten the reason, can you remind me when this is needed? Currently on darktable we have a simple way of handling optional dependencies:
If the above rule can work for GNU/Linux, Windows & macOS then we can probably remove the auto-download. This optional dependency will be documented into the README.md & in RELEASE_NOTES.md for 5.6. |
|
@TurboGit Let me remind the logic behind that decision, and maybe we can think how to improve this part.
Let me break it down per platform:
So this dependency is a bit harder to manage than a typical optional library — the system packages either don't exist or are missing the acceleration providers that make AI features practical. |
|
I'm training a set of models for NN demosaicing for Fuji X-Trans cameras: https://github.com/naorunaoru/x-veon The model already runs with ONNX Runtime backend. Question is, does your inference subsystem allow for NN demosaicing? I'd be obscenely happy to see my thing integrated with darktable. |
Proposed AI subsystem can run inference of any model bundled in a single ONNX format file. You just need to take care of the code that will consume this model in a module, library or script. |
Ok, clear now. I don't have further comment on this as I don't see a better solution than the one you are providing on your PR. Thanks for the quick reply. |
|
Ok, so I gave the PR a cursory glance and it seems that it's not yet possible to run inference on raw CFA data with this approach. The proposed denoise module is a lib that operates in sRGB space sidestepping the full pixel pipe, but demosaicing (and SoTA ML denoising for that matter) happens on raw linear data. In darktable's pipeline, demosaicing is handled by src/iop/demosaic.c, which runs extremely early in the pixel pipe (as an IOP module, not a lighttable lib). Making it an IOP that replaces/augments demosaic.c in the develop module would be much more useful for demosaic/denoise but significantly more work since it requires conforming to darktable's IOP API, handling ROI (region of interest) requests, and integrating with the tiling infrastructure that IOPs already have. |
|
@naorunaoru You will need your own code for data preparation for the specific task you are working on. Each module or library will handle it differently. Maybe ay some point we will come up with some reusable data pre- and post-processing. But that's ideas for later. |
|
I know it is quite a tangent. But I don't know if it would have been easier to add a connector to Comfy AI as they are doing at :: https://github.com/CyberTimon/RapidRAW Just a thought |
|
ComfyUI support does not fit well in darktable imo. It's a software with a ton of updates, many of which are breaking changes updates. Lately, a lot of it is vibe coded, the plugin support is horrendous and often has severe security vulnerabilities, etc. I say this a ComfyUI user with a lot of hours playing with it under my belt. The current proposal seems the most sensible in my opinion. ComfyUI is now also owned by VC, even if its owners are open supporters of FOSS, we all know what happens when VC money gets into something. |
This PR introduces an AI subsystem into darktable with two features built on top of it:
AI Object Mask — a new mask tool that lets users select objects in the image by clicking on them. It uses the Light HQ-SAM model to segment objects, then automatically vectorizes the result into path masks (using
ras2vect) that integrate with darktable's existing mask system.AI Denoise — a denoising module powered by the NAFNet model. This was initially developed as a simpler test case for the AI subsystem and is included here as a bonus feature.
Both models are converted to ONNX format for inference. Conversion scripts live in a separate repository: https://github.com/andriiryzhkov/darktable-ai. Models are not bundled with darktable — they are downloaded from GitHub Releases after the app is installed, with SHA256 verification. A new dependency on
libarchiveis added to handle extracting the downloaded model archives.AI subsystem design
The AI subsystem is currently built on top of ONNX Runtime, though the backend is abstracted to allow adding other inference engines in the future. ONNX Runtime is used from pre-built packages distributed on GitHub. On Windows, ONNX Runtime is built with MSVC, so using pre-built binaries is the natural approach for us — I initially expected this to be a problem, but discovered this is common practice among other open-source projects and works well.
The system is organized in three layers:
Backend (
src/ai/): Wraps ONNX Runtime C API behind opaque handles. Handles session creation, tensor I/O, float16 conversion, and hardware acceleration provider selection (CoreML, CUDA, ROCm, DirectML). Providers are enabled via runtime dynamic symbol lookup rather than compile-time linking, so there are no build dependencies on vendor-specific libraries. A separatesegmentation.cimplements the SAM two-stage encoder/decoder pipeline with embedding caching and iterative mask refinement.Model management (
src/common/ai_models.c): Registry that tracks available models, their download status, and user preferences. Downloads model packages from GitHub Releases with SHA256 verification, path traversal protection, and version-aware tag matching. Uses libarchive for safe extraction with symlink and dotdot protections. Thread-safe — all public getters return struct copies, not pointers into the registry.UI and modules: The object mask tool (
src/develop/masks/object.c) runs SAM encoding in a background thread to keep the UI responsive. The user sees a "working..." overlay during encoding, then clicks to place foreground/background prompts. Right-click finalizes by vectorizing the raster mask into Bézier path forms. AI denoise module (src/libs/denoise_ai.c) and preferences tab (src/gui/preferences_ai.c) provide the remaining user-facing features.Fixes: #12295, #19078, #19310