Hi there, thanks for contributing a good repo for the community.
When I tried to reproduce the results of CLAP score with caption_cot on VGGSound-test set (14k samples), I got results of: 33.97 (with GT Audio), 30.65 (with ThinkSound generated). It seems like something goes wrong on the caption_cot or my CLAP script. Could you please double check the AudioCoT released or provide the script you measured the CLAP?