Skip to content

Reproduction Results Differ from Reported Results — Seeking Possible Causes #9

@huhotel

Description

@huhotel

First, thank you very much for this excellent work! The project is extremely valuable for image restoration research and provides strong inspiration for our ongoing work.
During reproduction, we did not modify any of the configurations or default parameters, and strictly followed the settings provided in the repository. However, when running the Multiple-Degradation IR task on the MiO100 dataset using the GenMIR-P Profile, our reproduced results exhibit noticeable gaps compared with the reported scores, especially on RP(Reference Point) related metrics.
In addition, due to limitations in our CUDA version and environment, we had to exclude MAXIM and RIDCP from our available toolset. To verify whether this caused inconsistencies, we compared the per-image plan generated in our run with the plan provided by the authors. Interestingly, besides skipping MAXIM and RIDCP, we observed:
The order of subtasks sometimes differs from the authors' plan
Even when the subtask order matches, the selected optimal tool for a subtask may differ
These inconsistencies may further contribute to the performance gaps we observed.
Our reproduced results screenshot:
Image
Therefore, we would like to ask:
What might be the possible reasons for these discrepancies?
For example, could they be related to:
Environment versions (CUDA / cuDNN / PyTorch)
Model weight versions
Dataset preprocessing details
MiO100 dataset source or unpacking details
Inference script or internal parameters
etc.?
Thank you very much for your time and for providing such an impressive work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions