I've tried to make something out of it, but it takes about 40 epochs with 1 hour audio data set for it to kick in. Prior to that it is just generates somewhat structured white noise, after that for another 40-50 epochs the generated waveform lacks proper phase.


I've tried to make something out of it, but it takes about 40 epochs with 1 hour audio data set for it to kick in. Prior to that it is just generates somewhat structured white noise, after that for another 40-50 epochs the generated waveform lacks proper phase.