I have made Single-Concept Fine-tuning for cats, dogs, and wooden pot respectively.They performed very well.



But when I wanted to integrate two concepts, the result was not ideal.
Firstly, there are cats and dogs. When my prompt is "the <new1> cat play a ball with a <new2> dog", there is no dog in the picture. Here are my training commands and results.
python src/composenW.py --paths logs/2024-01-18T21-51-09_cat-sdv4/checkpoints/delta_epoch\=000002.ckpt+logs/2024-01-18T23-25-52_dog-sdv4/checkpoints/delta_epoch\=000001.ckpt --categories "cat+dog" --ckpt ./models/sd-v1-4.ckpt
##sample
python sample.py --prompt "the <new1> cat play a ball with a <new2> dog" --delta_ckpt optimized_logs/optimized_cat+dog/checkpoints/delta_epoch\=000000.ckpt --ckpt ./models/sd-v1-4.ckpt

Afterwards, I tried to merge cats and wooden pot, but when my prompt was "the <new2> cat sculpture in the style of a <new1> wooden pot", the results were not ideal. The following are the training commands and results.
python src/composenW.py --paths logs/2024-01-22T15-11-17_wooden_pot-sdv4/checkpoints/delta_epoch=000002.ckpt+logs/2024-01-18T21-51-09_cat-sdv4/checkpoints/delta_epoch\=000000.ckpt --categories "wooden_pot+cat" --ckpt ./models/sd-v1-4.ckpt
##sample
python sample.py --prompt "the <new2> cat sculpture in the style of a <new1> wooden pot" --delta_ckpt optimized_logs/optimized_wooden_pot+cat/checkpoints/delta_epoch=000000.ckpt --ckpt ./models/sd-v1-4.ckpt

Did I make a mistake somewhere, and why is this result not quite correct?
I have made Single-Concept Fine-tuning for cats, dogs, and wooden pot respectively.They performed very well.



But when I wanted to integrate two concepts, the result was not ideal.
Firstly, there are cats and dogs. When my prompt is "the <new1> cat play a ball with a <new2> dog", there is no dog in the picture. Here are my training commands and results.
Afterwards, I tried to merge cats and wooden pot, but when my prompt was "the <new2> cat sculpture in the style of a <new1> wooden pot", the results were not ideal. The following are the training commands and results.
Did I make a mistake somewhere, and why is this result not quite correct?