Hi, I find that as the training goes (beyond 20 epochs), the loss will gradually become negative. May I ask if this is harmful to downstream tasks? Thank you!
Hi,
I find that as the training goes (beyond 20 epochs), the loss will gradually become negative. May I ask if this is harmful to downstream tasks?
Thank you!