Skip to content

Add per-frame timestamp embedding to the VLM video path#128

Open
amazloumi wants to merge 3 commits into
mainfrom
video/per-frame-timestamps
Open

Add per-frame timestamp embedding to the VLM video path#128
amazloumi wants to merge 3 commits into
mainfrom
video/per-frame-timestamps

Conversation

@amazloumi

@amazloumi amazloumi commented Jun 26, 2026

Copy link
Copy Markdown
Member

Summary

  • Add FrameTimeEmbedding (kempnerforge/model/frame_time.py): sinusoidal features of a frame's timestamp (seconds) at log-spaced periods → a zero-initialized projection (identity at step 0).
  • decode_video_frames returns (frames, times); WebVidVideoDataset emits frame_times (F,), VideoCollator stacks to (B, F).
  • Applied per frame in _project_visual_features as a VLMWrapper sibling submodule (video only; None for the image path); built + FSDP-sharded + meta-materialized at both build sites.
  • scripts/train.py threads frame_times; docs + CHANGELOG updated.
  • Make the time embedding registry-driven: [time_embedding] selects the implementation (type = "sinusoidal" default, "none" disables) via @registry.register_time_embedding; new techniques drop in as small additions. Sequence-modifying encodings (Molmo2-style text time-tokens) are flagged as a separate future hook (needs interleaved-sequence support).

Testing

  • uv run ruff check kempnerforge/ tests/ passes
  • uv run ruff format --check kempnerforge/ tests/ scripts/ passes
  • uv run pyright kempnerforge/ passes (0 errors)
  • uv run pytest tests/unit/ -v --timeout=60 passes (1527 passed, 2 skipped)
  • Distributed (parallel.py changed): uv run torchrun --nproc_per_node=4 -m pytest tests/distributed/ -v ← running this now
  • 2-GPU FSDP smoke on vlm_video_webvid.toml (random encoder): trains, +33,792 params confirms the module is sharded/trainable

Closes #127

@amazloumi amazloumi marked this pull request as draft June 26, 2026 14:46
@amazloumi amazloumi marked this pull request as ready for review June 26, 2026 18:29
Base automatically changed from worktree-video-pipeline to main June 26, 2026 21:05
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.16129% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
kempnerforge/distributed/parallel.py 58.33% 5 Missing ⚠️
kempnerforge/data/video_io.py 88.88% 1 Missing ⚠️
Files with missing lines Coverage Δ
kempnerforge/config/job.py 88.79% <100.00%> (+0.19%) ⬆️
kempnerforge/config/registry.py 100.00% <100.00%> (ø)
kempnerforge/config/schema.py 100.00% <100.00%> (ø)
kempnerforge/config/time_embedding.py 100.00% <100.00%> (ø)
kempnerforge/data/video_dataset.py 93.16% <100.00%> (+0.17%) ⬆️
kempnerforge/model/frame_time.py 100.00% <100.00%> (ø)
kempnerforge/model/vlm.py 99.15% <100.00%> (+0.08%) ⬆️
kempnerforge/data/video_io.py 81.96% <88.88%> (+1.96%) ⬆️
kempnerforge/distributed/parallel.py 58.79% <58.33%> (-0.28%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds absolute per-frame timestamp conditioning to the VLM video pathway by propagating decoded frame presentation times through the data pipeline and injecting a registry-configurable, zero-initialized time embedding into each frame’s visual tokens.

Changes:

  • decode_video_frames now returns (frames, times); datasets/collator propagate frame_times as (F,) / (B, F) and training threads it into the model forward.
  • Introduces a registry-driven time-embedding module (FrameTimeEmbedding default; "none" disables) and applies it per-frame in the VLM visual-token projection (video-only).
  • Ensures distributed/FSDP build paths materialize and shard the new submodule; adds unit coverage for config, embedding behavior, and build wiring.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/test_vlm.py Adds unit tests ensuring video wrappers attach the module, image wrappers do not, and forward wiring/shape checks behave as expected.
tests/unit/test_video_io.py Updates tests for (frames, times) return and basic timestamp properties.
tests/unit/test_video_dataset.py Updates dataset/collator tests to validate frame_times padding/stacking behavior.
tests/unit/test_time_embedding_config.py Adds coverage for TimeEmbeddingConfig defaults, validation, and kwargs.
tests/unit/test_frame_time.py Adds coverage for embedding shape, zero-init behavior, gradient flow, dtype behavior, and registry builder.
tests/unit/test_distributed.py Verifies distributed build attaches/casts frame_time_embed for video and omits it for images.
scripts/train.py Threads time_embedding_config into model build and passes frame_times into VLM forward.
kempnerforge/model/vlm.py Adds frame_times plumbing and applies per-frame timestamp embeddings in _project_visual_features.
kempnerforge/model/frame_time.py Introduces TimeEmbedding interface, FrameTimeEmbedding, and registry-driven build_time_embedding.
kempnerforge/distributed/parallel.py Builds/materializes/casts frame_time_embed in meta/CPU paths and FSDP-shards it when present.
kempnerforge/data/video_io.py Changes decode_video_frames to also return matched presentation timestamps.
kempnerforge/data/video_dataset.py Emits frame_times per sample and stacks it in VideoCollator.
kempnerforge/config/time_embedding.py Adds [time_embedding] config with validation and builder kwargs.
kempnerforge/config/schema.py Exposes TimeEmbeddingConfig in the config schema surface.
kempnerforge/config/registry.py Adds time-embedding registry hooks (register/get/list_time_embedding).
kempnerforge/config/job.py Adds optional time_embedding field to JobConfig.
docs/how-to/train-on-video.md Documents per-frame timestamp embedding and registry semantics.
CHANGELOG.md Records the new per-frame timestamp feature and affected components.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread kempnerforge/model/frame_time.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Encode per-frame timestamps for the VLM video path

2 participants