[AI] AI object mask tool#20378
Conversation
|
Thanks for this new implementation. I'll test soon. |
|
@andriiryzhkov : I have created the |
|
@TurboGit I will create a PR with models that are ready, because I have some models just for next experiments. I will keep them separate. |
|
Sounds good to me. |
|
@TurboGit can you initialize repository |
|
@andriiryzhkov : Done. |
|
Merged. |
|
@andriiryzhkov : How to test this, I have each time a message that the model is not available. |
|
@TurboGit: Oh, that's because the repository reorganization which I did yesterday. I renamed my original repo to Regarding the testing, just want to warn you that SegNext works somewhat slower (it's just bigger model) and produces way too many paths at the end. I am testing it too and tweaking post-processing parameters to get rid of unwanted artefacts. |
|
Is this compatible with more "classic", un-prompted, approaches using CNNs trained to only mask specific classes (person, dog, sky etc.) of objects? |
|
@Donatzsky AI subsystem is compatible with "classic" segmentation models - it can easily handle them. But current PR does not include such functionality for masking. That is something for further consideration. |
Adds a new mask tool that lets users select objects in the image by clicking. Built on the AI subsystem from #20322.
How it works
AI object mask is an interactive single-object selection tool. The user activates the object mask tool, waits for the image to be encoded (background thread), then clicks to place foreground/background point prompts. The model segments the object in real time. Right-click finalizes the selection by vectorizing the raster mask into Bézier path forms that integrate with darktable's existing mask system.
Architecture
Segmentation engine (
src/ai/segmentation.c): Implements the two-stage encoder/decoder pipeline. Supports both SAM2.1 (multi-mask + IoU selection + low-res refinement) and SegNext (single mask, full-res refinement). Encoder outputs are cached so multiple clicks don't re-encode.Object mask tool (
src/develop/masks/object.c): Runs image encoding in a background thread to keep the UI responsive. Displays a "working..." overlay during encoding. Supports foreground clicks (label 1), background clicks (label 0), and box prompts (SAM only).Raster-to-vector (
src/common/ras2vect.c): Extended with cleanup (turdsize), smoothing (alphamax), and boundary sign output for hole detection.Models
The segmentation engine supports both SegNext and SAM model architectures. SegNext is the default — it produces good enough results and is compliant with the Open Source AI Definition. Models are downloaded on demand by the AI subsystem from the model repository: https://github.com/andriiryzhkov/darktable-ai
Depends on #20322
Fixes #12295