isomp4: fix track duration and playback for fragmented files#332
isomp4: fix track duration and playback for fragmented files#332sscobici wants to merge 2 commits intopdeljanov:mainfrom
Conversation
|
will mark this as draft, waiting for mp4 refactoring |
5df709a to
d525dde
Compare
|
This doesn't appear to handle edits? |
|
Hi, Symphonia doesn't currently account for edit lists during MP4 playback. With these edits, a file's duration might be reported as 10s even if it only contains 1s of actual audio/video frames. I’ve reviewed your proposed changes and noticed there is also a fix in version 0.5. I’ll try adjust this PR to fix MP4 duration in 0.6, though it won't include support for edits or streaming yet. Unless you already have a PR opened. |
d525dde to
005e7be
Compare
|
This PR corrects errors introduced in my previous PR #317. Future Work (outside of this PR)
Related Issues
Credits
|
005e7be to
2be03ff
Compare
2be03ff to
3c33704
Compare
3c33704 to
001e474
Compare
001e474 to
b876fa7
Compare
pdeljanov
left a comment
There was a problem hiding this comment.
Thanks for attempting these fixes.
I left some comments for things I'd like to see changed, but also looking for some clarifications to help my understanding.
On an aside, you have 4 PRs open right now, but this is the only non-draft one. Is this the only one you'd like to land in 0.6?
| #[derive(Debug)] | ||
| pub struct TimeSpan { | ||
| pub timescale: NonZero<u32>, | ||
| pub duration: u64, | ||
| } | ||
|
|
||
| impl Default for TimeSpan { | ||
| fn default() -> Self { | ||
| Self { timescale: NonZero::new(1).unwrap(), duration: 0 } | ||
| } | ||
| } | ||
|
|
||
| impl TimeSpan { | ||
| pub fn new(timescale: NonZero<u32>, duration: u64) -> Self { | ||
| TimeSpan { timescale, duration } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Let's not add this too core, it's not integrated with anything else in units. If there is a need for it to be in core in the future, it can moved later.
There was a problem hiding this comment.
ok, moved to demuxer
| else { | ||
| // for non fragmented mp4 mdhd.duration must be equal to stts.total_duration | ||
| for trak in traks.iter_mut() { | ||
| trak.mdia.mdhd.duration = trak.mdia.minf.stbl.stts.total_duration; |
There was a problem hiding this comment.
This is for consistency since we don't support edits? If there aren't any edits, then these values should be roughly equal as far as I understand.
There was a problem hiding this comment.
Thanks for challenging this. I removed this change. In #440 the code tries to fill duration from stts.total_duration when mdhd.duration is 0, but I'm not sure this is a good idea. We should keep mdhd.duration as it is and just not use during seeking / parsing as it is often different from stts.total_duration (don't know the real reason for this, maybe edits).
mdhd.duration 0 is a valid case for fragmented mp4, because stts.total_duration is also zero in that case.
| }; | ||
|
|
||
| let duration = | ||
| if mdia_atom.mdhd.duration != 0 { mdia_atom.mdhd.duration } else { tkhd_atom.duration }; |
There was a problem hiding this comment.
Ah. This would've been wrong. The TKHD duration is in the movie timescale rather than the track's timescale.
There was a problem hiding this comment.
will try to run some local tests to catch these kind of regressions in future
| } | ||
| sidx_timespan.duration += new_sidx.total_duration; | ||
|
|
||
| // select the index with the earliest presentation timestamp to be the first. |
There was a problem hiding this comment.
Do we even need to hold onto sidx anymore? It's only used to print an information message if it's Some. Instead, you can just see if the hashmap contains atleast 1 entry.
There was a problem hiding this comment.
removed sidx variable, but this PR doesn't store the sidx atoms for further reuse - just calculates the total duration
| let timespan = match (moov.is_fragmented(), sidx_timespans.get(&trak.tkhd.id)) { | ||
| (true, Some(sidx_ts)) => TimeSpan::new(sidx_ts.timescale, sidx_ts.duration), | ||
| _ => TimeSpan::new(trak.mdia.mdhd.timescale, trak.mdia.mdhd.duration), | ||
| }; |
There was a problem hiding this comment.
Can we not do a crazy match for this? ;)
How about this:
let timespan = sidx_timespans
.get(&trak.tkhd.id)
.filter(|_| moov.is_fragmented())
.map(|sidx_ts| TimeSpan::new(sidx_ts.timescale, sidx_ts.duration))
.unwrap_or_else(|| {
TimeSpan::new(trak.mdia.mdhd.timescale, trak.mdia.mdhd.duration)
});There was a problem hiding this comment.
yes, adjusted.
Seems that format::Track object needs an enhancement.
If we want it to have both num_frames (expressed in media timebase) and duration (expressed in movie timebase), then we need two timebases or use two TimeSpan objects.
MediaTimeSpan can be 2 seconds expressed in media timescale
TrackTimeSpan can be 10 seconds expressed in movie timescale due to edits which can repeat media 5 times.
If we have single timescale we need to transform trak.tkhd.duration into media timescale.
| // All tracks ended. | ||
| true | ||
| // cannot determine the total track duration in the fragmented file | ||
| false |
There was a problem hiding this comment.
Are we sure about that? Isn't that why we are summing the duration in the segment index?
There was a problem hiding this comment.
Some fragmented MP4 files lack a segment index (sidx). In such cases, if the file is seekable, it is possible to parse all movie fragments (moof) to calculate the total track duration (not sure we need this). However, if a fragmented file is non-seekable and lacks a segment index, the duration must be treated as infinite.
// If all tracks ended in the last segment, then do not try to read anymore segments.
// Note, there will always be one segment because the moov atom was converted into a segment
// when the reader was instantiated.
if self.segs.last().unwrap().all_tracks_ended() {
return Ok(false);
}
This check is unreliable for fragmented files. Even after reaching the end of a movie segment, the demuxer must attempt to read the next one—particularly when the total stream duration is unknown
b876fa7 to
08a8d04
Compare
08a8d04 to
547a3e7
Compare
#493 has some minor changes, it will be good if it is merged. I have some local draft for video PTS calculations but I need more time to get is equivalent to ffmpeg (seems I forgot edits). I added setting duration for format::Track in the media timescale. Many files have it populated considering edits. |
I'll take a look. Seems pretty minimal and straightforward, so let's see if we can get that one for 0.6.
This one is based off of the old I'd like to align on some things regarding timing. In 0.6, we now make a distinction between number of frames and the duration. The number of frames is the number of decoded audio or video frames, excluding any delay or padding frames (audio only). It is a count, and there is no timebase for it. For users who want to know the total number of decoded frames including discarded, they can add in the delay and padding fields of The duration is the running time of the track, as stated by the container, in timebase units, excluding any delay or padding frames. Likewise, DTS and PTS are also in timebase units and represent instants along the duration of the track. Since Symphonia was historically audio-focused, duration and number of frames were often used interchangeably. So, all this to say, for containers that don't specify a timebase, populating the duration of the track with number of frames is completely valid wherever we set the timebase to be What may be missing is the concept of an overall media (all track) duration, timebase, etc. I'm not sure how to fit this in with the per-track duration, timebase, etc. Does this track with your understanding? |
I agree. I've added a commit to set the duration as num_frames for other containers; for MKV, there is a better implementation in #494.
I believe general file/stream information should live at the Format level (Readers). We need to define a struct and specify which fields are relevant, and each demuxer will populate them. For now, this PR assumes the track timebase is the media timebase (non general) and handles the conversion for MP4 track durations, which are expressed in the container's general timebase. |
the approach taken in PR #317 doesn't allow to adjust duration to mdhd timescale, so I reverted the changes and modified only moov atom
Please advise what code I can reuse to avoid/reduce the risk of multiplication overflow during the timescale conversion.
I found some logic in core\units.rs but didn't find it easy to reuse.