VisionDepth3D (VD3D) is a high-performance 2D-to-3D conversion suite built for real-time previewing, cinematic stereo rendering, and advanced depth-based video processing.
It integrates AI depth estimation, pixel-accurate stereo warping, live 3D visualization, FPS interpolation, and AI upscaling into a unified, GPU-accelerated workflow.
VD3D is designed to scale from fast scene testing to full-length feature conversions, giving creators precise control over depth, comfort, and visual quality.
This user guide walks you through the complete VisionDepth3D workflow, including:
- Generating high-quality depth maps from images and video
- Blending multiple depth sources for cleaner results
- Converting 2D footage into cinematic stereoscopic 3D
- Enhancing FPS and resolution using AI tools
- Restoring and syncing audio after processing
- Using the real-time VD3D Live system for live 3D preview and external output
By the end of this guide, you’ll be able to confidently create smooth, comfortable, and high-quality 3D content using VD3D from start to finish.
The FPS / Upscale Enhancer tab allows you to:
- Increase video smoothness using AI frame interpolation (RIFE)
- Enhance resolution using AI upscaling (Real-ESRGAN)
- Automatically split long videos into manageable scenes using PySceneDetect
- Rebuild high-quality output videos with hardware-accelerated encoding
This system is ideal for improving older content, low-resolution sources, and creating ultra-smooth playback for VR and high refresh rate displays.
- Click Extract Frames from Video and select your source video
- Click Select Output Folder to choose where frames will be saved
- Choose an image format:
- JPG for lower memory usage and faster processing
- PNG for maximum quality
- Once extraction completes, the Input Frames Folder will automatically populate with the extracted frames
- Select Output Video File and choose a format (MP4, MKV, AVI, etc.)
- Enable processing options:
- RIFE Interpolation for FPS enhancement
- ESRGAN Upscaling for resolution improvement
- Enable both if desired
Enter your target output resolution (Width × Height).
Example:
Original: 720 × 480
Upscaled Output: 2880 × 2160
(4× upscaling in both dimensions)
Enter the original frame rate of the source video.
Example:
If the original clip is 29.97 FPS, enter 29.97
This ensures proper interpolation timing and smooth output.
If RIFE is enabled, select the FPS multiplier:
- ×2 (30 → 60 FPS)
- ×4 (30 → 120 FPS)
- ×8 (30 → 240 FPS)
Higher values create ultra-smooth motion but require more processing time and may introduce more artifacts.
Select your preferred encoder:
- H.264 / H.265 CPU encoding (universal compatibility)
- NVENC GPU encoding (recommended for NVIDIA GPUs for speed)
AI Blending Strength
Controls how much of the AI-enhanced detail is blended with the original frame:
- Lower values = stronger AI sharpening
- Higher values = more original texture preserved
Input Resolution Scaling
Downscales the input frame before AI upscaling to:
- Reduce memory usage
- Increase processing speed
- Still achieve high-quality results
VisionDepth3D provides two different processing modes for FPS interpolation and upscaling.
Both produce the same visual results, but differ in how they use system resources and performance flow.
The Merged Pipeline runs interpolation and upscaling in a single sequential workflow:
- A frame pair is interpolated using RIFE
- The interpolated frames are immediately passed through ESRGAN (if enabled)
- Frames are written directly to the output video before moving to the next pair
- Simpler processing flow
- Very stable and predictable
- Uses less system memory
- Ideal for:
- Lower-end systems
- Long videos
- Maximum reliability
- You experience stuttering or memory limits
- You want guaranteed smooth processing
- You are running very high resolutions
The Threaded Pipeline runs interpolation, upscaling, and video writing in parallel using multiple worker threads:
• One thread generates interpolated frames (RIFE)
• One thread upscales frames (ESRGAN)
• One thread writes frames to the output video
Frames are buffered and synchronized to maintain correct ordering.
- Much higher throughput
- Better GPU utilization
- Faster overall render times
- Slightly higher memory usage
- You have a strong GPU
- You want maximum performance
- You are processing shorter clips or high FPS output
| Pipeline | Stability | Speed | Memory Use | Best For |
|---|---|---|---|---|
| Merged Pipeline | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Low | Long renders, reliability |
| Threaded Pipeline | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Medium | Fast high-performance jobs |
Both pipelines produce identical final video quality.
The difference is strictly in processing speed and system resource usage.
Choose based on your hardware and workload.
The Depth Estimation tab generates depth maps from images or videos using AI models.
If this is your first time using VisionDepth3D, follow the steps below to render your first depth map.
-
Open the Depth Estimation tab.
-
Select a Model
- Choose a recommended model such as Depth Anything V2.
- The first time you load a model, it may take a moment to initialize.
-
Choose an Output Directory
- Click Choose Directory.
- Select a folder where your depth map will be saved.
-
Leave Settings at Default
For your first test, keep:- Colormap: Default
- Invert Depth: Off
- Batch Size: Default value
- Inference Resolution: A preset like 512×288 or 704×384
-
Click Process Image
-
Select your image file.
VD3D will:
- Generate a depth map
- Display the result in the preview window
- Save the output as:
yourfilename_depth.png
You have successfully created your first depth map.
- Select your model.
- Choose an Output Directory.
- Keep default settings for your first run.
- Click Process Video.
- Select your video file.
VD3D will:
- Process each frame
- Generate a depth video
- Save it as:
yourvideo_depth.mkv
If you only need depth for 3D conversion, you typically do not need to change colormap or other advanced settings.
After your first successful render, you can begin adjusting:
- Increase Inference Resolution for more detailed depth.
- Increase Batch Size if your GPU has available VRAM.
- Enable Invert Depth if near/far values appear reversed.
- Enable Save Frames if you need individual depth PNG files.
For most users, default settings work very well.
After your first successful render, you can fine-tune performance and detail.
Controls internal processing resolution.
- Lower resolution = faster processing
- Higher resolution = more detailed depth
For full movies, many users start at 512×288 and increase if needed.
Controls how many frames are processed at once.
- Higher values = faster on strong GPUs
- Lower values = safer if you hit VRAM limits
If you run out of memory, reduce this first.
Flips near and far values.
Enable this if foreground objects appear darker when they should be closer.
When enabled, VD3D saves individual depth PNG frames in addition to the depth video.
Useful for:
- Manual inspection
- Custom 3D workflows
Reduces VRAM usage by moving parts of the model to CPU.
- None = fastest, highest VRAM usage
- Sequential = balanced
- Full = lowest VRAM usage, slowest
Only adjust this if you encounter memory limits.
Reduces VRAM usage and can increase speed on supported GPUs.
Recommended for CUDA GPUs.
Select one of the following:
- Process Image – Single image input
- Process Image Folder – Batch image processing
- Process Video – Generate depth video
- Process Video Folder – Batch video processing
For your first test, use Process Image.
Image depth outputs:
filename_depth.png
Video depth outputs:
filename_depth.mkv
Depth maps are saved in grayscale format and are ready for use in the 3D Generator tab.
- Pause temporarily halts processing
- Resume continues where it left off
- Cancel safely stops processing
Note:
For best results in the Depth Blender tab, render two separate depth maps:
- One using a Depth Anything V1 Base model (white-balanced source)
- One using a Depth Anything V2 Large model (Base Source)
Blending these two depth sources improves edge stability, subject separation, and overall depth consistency.
The Depth Blender tool lets you merge two different depth sources into one cleaner depth result.
It is designed for cases where:
- One model produces strong subject separation but noisy backgrounds
- Another model produces stable backgrounds but weaker subject edges
- You want to blend both into a single depth map or depth video that behaves better in 3D conversion
You can run it on:
- Folders of PNG depth frames
- Two depth videos
A live preview panel lets you scrub frames and see adjustments instantly before running a full batch.
In the Mode section select one:
- Folders (frames) for depth frame sequences (
.png) - Videos for depth videos (
.mp4,.mkv,.avi,.mov)
Under Inputs:
- Set V1 path
- Set V2 path (this is the “base” depth map)
Notes:
- V1 is used to contribute extra detail or stronger whites where needed
- V2 is treated as the main reference depth that the output is normalized to
If you are using Folders (frames) you have two options:
-
Overwrite V2
The blended frames replace the original PNGs inside the V2 folder. -
Output Folder
Turn off overwrite and select an output directory to save blended frames separately.
If you are using Videos, select an output file location such as:
blended_depth.mp4
Under Final Size (optional):
- Leave Width and Height blank to keep the original resolution
- Enter values to force the output size for every frame
Example:
- Width:
1920 - Height:
1080
Use the live preview tools to verify your blend:
- Click Preview Now
- Use the Preview Frame slider to scrub
- Use the arrow keys:
- Left Arrow goes to the previous frame
- Right Arrow goes to the next frame
The preview shows:
- V2 Base on the left
- Blended Output on the right
These sliders update the preview live.
Controls how strongly V1 can contribute its high depth whites into V2.
- Lower values keep output closer to V2
- Higher values inject more of V1’s bright depth regions
Controls the softness of the blending transition.
- Low values create sharper merges
- Higher values create smoother, more gradual blending
Boosts local contrast in the blended result.
- Higher values can increase depth “punch”
- Too high can increase noise
Controls how localized the CLAHE contrast enhancement is.
- Lower tile size can increase detail but may look harsher
- Higher tile size is smoother and more global
Strength of edge-preserving smoothing.
- Higher values smooth more while keeping edges
- Too high can soften fine detail
How much intensity difference is allowed during smoothing.
- Higher values smooth more aggressively
- Lower values protect contrast
How far smoothing spreads spatially.
- Higher values affect larger areas
- Lower values keep smoothing tighter
When your preview looks correct:
- Click Start Batch
- Watch the progress bar and log window
- Click Stop if you need to cancel safely
Frames mode output:
filename.png (blended depth frames saved as PNG)
Video mode output:
blended_depth.mp4 (grayscale depth video output)
The blended results are grayscale depth and are ready to use in the 3D Generator tab.
The 3D Generator tab converts a 2D source video and its matching depth map into a stereoscopic 3D video.
This is the final stage of the VisionDepth3D workflow. It takes:
- the original 2D video
- the generated or blended depth map video
- your stereo/parallax settings
- your output and encoding options
and renders a final 3D video using the current VisionDepth3D Method.
The current method uses subject-aware depth normalization, pop-control depth shaping, structured near / mid / far disparity weighting, GPU stereo warping, edge-aware repair, dynamic convergence, and floating-window protection to create a controllable stereo result.
VisionDepth3D now uses the updated VisionDepth3D Method for stereo generation.
This method uses a different stereo shift convention than older versions of VisionDepth3D.
In the current pipeline:
- Foreground Shift is usually negative
- Midground Shift is usually slightly negative or near zero
- Background Shift is usually positive
This may feel opposite from older VisionDepth3D presets.
Older presets that used positive foreground values may now produce a very different stereo result. If you are updating from an older version, it is recommended to start from the new default presets instead of copying older shift values directly.
| Control | Recommended Range |
|---|---|
| Foreground Shift | -5.0 to -10.0 |
| Midground Shift | -0.5 to -2.0 |
| Background Shift | +2.0 to +5.0 |
Foreground Shift: -6.0
Midground Shift: -0.8
Background Shift: +2.2
Foreground Shift: -8.5
Midground Shift: -1.2
Background Shift: +3.5
Foreground Shift: -10.0 to -12.0
Midground Shift: -2.0
Background Shift: +4.0 to +5.0
The 3D effect comes from separation between near, mid, and far depth regions. In the current renderer, negative foreground shift pulls near objects toward the viewer, while positive background shift pushes distant areas deeper behind the screen plane.
A good basic relationship is:
Foreground Shift < Midground Shift < Background Shift
Example:
FG -6.0 / MG -0.8 / BG +2.2
Presets created for older VisionDepth3D versions may not transfer directly to the new method.
If an older preset used positive foreground values, it may now:
- push depth in the wrong direction
- reduce pop-out
- make the scene feel inverted
- create uncomfortable or flat stereo separation
- produce a very different output than expected
Foreground: positive
Background: negative
Foreground: negative
Midground: slightly negative or near zero
Background: positive
When converting older presets, do not simply copy the same numbers. Start with one of the new default presets, then tune using Preview Mode, Shift Heatmap, and Anaglyph preview.
You must provide:
-
Input Video
The original 2D source video. -
Depth Map Video
The matching depth map video generated from the Depth Engine tab or Depth Blender. -
Output Path
The location where the final 3D video will be saved.
The source video and depth map video should match in:
- resolution
- frame count
- frame rate
- clip length
If they do not match, the stereo render may drift, desync, or produce incorrect depth alignment.
The 3D Generator displays the original source video size in the preview metadata bar when available.
Example:
Original: 1920×1080 (1.78:1)
Use this information to choose the correct output size and aspect ratio.
Common source sizes:
| Source Size | Aspect Ratio | Notes |
|---|---|---|
1920×1080 |
1.78:1 |
Standard 16:9 |
3840×2160 |
1.78:1 |
4K 16:9 |
1920×800 |
2.40:1 |
Cinematic widescreen |
1440×1080 |
1.33:1 |
4:3 style |
1080×1920 |
0.56:1 |
Vertical 9:16 |
This helps users avoid accidentally stretching or cropping their video into the wrong output shape.
Use Output & Encoding to configure how the final 3D video is packaged.
Here you can set:
-
Output Format
- Full-SBS
- Half-SBS
- VR
- VR180 Equirect Top-Bottom
- VR180 Equirect Side-by-Side
- Red-Cyan Anaglyph
- Passive Interlaced
-
Stereo Output
- SBS
- Left eye only
- Right eye only
- Both eyes separately
-
Aspect Ratio
- Default 16:9
- Classic 4:3
- Square 1:1
- Vertical 9:16
- CinemaScope / Anamorphic / UltraWide formats
-
Codec
- H.264 / H.265 CPU encoding
- NVENC for NVIDIA GPUs
- AMF for AMD GPUs
- QSV for Intel GPUs
- AV1 options where supported
-
Audio Handling
- Keep original audio when available
- Export video-only if audio is not needed
-
HDR10 Preservation
- Use when working with compatible HDR source material
NVENC H.264 or NVENC H.265 is recommended for NVIDIA users who want faster encoding.
Use Processing Options to control stereo stability, edge behavior, and render safety.
Common recommended options:
-
Preserve Original Aspect Ratio
Keeps the source framing from being stretched. -
Auto Crop Black Bars
Detects and removes letterbox bars before stereo generation when appropriate. -
Stabilize Zero-Parallax
Helps keep the subject or dominant depth region closer to the screen plane. -
Skip Blank / White Frames
Avoids rendering empty frames that can appear in some sources. -
Enable Edge Masking
Reduces harsh stereo artifacts around strong depth edges. -
Enable Feathering
Softens transitions between shifted regions. -
Enable Dynamic Convergence
Smooths convergence changes across scenes. -
Enable Floating Window
Adds cinematic edge protection when strong pop-out approaches frame borders. -
Clip Range
Allows short test renders before committing to a full video.
These settings directly affect visual comfort, stereo stability, and artifact control.
Click Load Preview Sources to open the source video and depth map for preview.
The preview system lets you:
- scrub through frames
- test different preview modes
- inspect stereo direction
- check depth alignment
- tune shift values before rendering
- save preview images for comparison
Testing preview frames is strongly recommended before starting a full render.
Use this for a quick stereo check.
It helps you inspect:
- stereo direction
- subject placement
- edge ghosting
- convergence comfort
- whether the scene feels pushed forward or backward
If the image feels inverted, uncomfortable, or backwards, check depth inversion, shift direction, and eye order.
Useful for displays that support interlaced stereo or for checking alternating-line stereo separation.
Shows a half side-by-side stereo preview.
Use this when checking headset-style or SBS-based output.
The Shift Heatmap visualizes stereo displacement.
Use it to check:
- whether foreground, midground, and background are separating correctly
- whether foreground is receiving enough negative shift
- whether background is receiving positive push
- whether extreme shift is being clamped
This is one of the best modes for tuning the new method.
Shows the strength of displacement without focusing on direction.
Use this to see where the strongest stereo stress exists.
Shows a clipped range of shift values to make smaller displacement differences easier to inspect.
Useful when tuning subtle scenes.
Displays shift direction visually.
Use this to confirm whether near and far regions are moving in the expected directions.
Shows differences between the two generated eye views.
Useful for spotting:
- excessive disparity
- edge tearing
- ghosting
- overly aggressive stereo separation
Shows the feathering mask used to soften depth transitions.
Useful when diagnosing harsh cutout edges.
Shows the blended feathering result.
Useful when checking whether stereo transitions are too sharp or too soft.
Controls how strongly near objects are pulled toward the viewer.
In the current VisionDepth3D Method, foreground pop is usually created with negative values.
More negative values:
- increase foreground pop-out
- pull close subjects and objects forward
- create stronger stereo separation
Less negative values:
- create a more subtle 3D effect
- reduce eye strain
- keep subjects closer to the screen plane
Recommended range:
| Style | Range |
|---|---|
| Natural | -5.0 to -7.0 |
| Strong | -8.0 to -10.0 |
| Aggressive | -10.0 to -12.0 |
If the foreground looks too flat, make the value more negative.
If the foreground feels uncomfortable, stretched, or too separated, move it closer to zero.
Controls the depth position of objects between foreground and background.
In the current method, midground shift is usually slightly negative or near zero.
Typical values:
| Style | Value |
|---|---|
| Subtle mid-depth | -0.5 |
| Natural layering | -0.8 to -1.2 |
| Strong layering | -1.5 to -2.0 |
| Neutral screen-plane feel | 0.0 |
Midground shift helps connect the foreground and background so the scene does not feel like only two flat layers.
If the image looks like cardboard cutouts, reduce the gap between foreground and midground values.
Example:
Too separated:
FG -12.0 / MG 0.0 / BG +5.0
More natural:
FG -6.0 / MG -0.8 / BG +2.2
Controls how far distant scene elements are pushed behind the screen plane.
In the current VisionDepth3D Method, background depth is usually created with positive values.
Higher positive values:
- push backgrounds deeper
- increase cinematic depth scale
- make environments feel larger
Lower positive values:
- keep backgrounds closer
- reduce eye strain
- create a more natural stereo effect
Recommended range:
| Style | Range |
|---|---|
| Subtle | +1.0 to +2.0 |
| Natural | +2.0 to +3.0 |
| Strong | +3.5 to +5.0 |
Avoid pushing the background too far if the foreground is already very negative, because the scene can start to look stretched, separated, or uncomfortable.
Controls how strongly the convergence plane is adjusted.
Higher values:
- move the perceived focus plane more aggressively
- increase perceived depth movement between shots
- can make scene transitions more dramatic
Lower values:
- produce more stable convergence
- reduce eye strain
- keep long-form content more comfortable
Use smaller values for full-length videos.
For aggressive pop-out testing, reduce convergence strength or disable dynamic convergence temporarily so the pipeline does not pull the foreground back toward the screen plane.
Fine-tunes the depth level that sits at the screen surface.
Use this when:
- the scene feels too far forward
- the scene feels pushed too far backward
- subjects are not sitting where expected
- the stereo field feels offset
Zero parallax is a precision control. Small changes can have a noticeable effect.
Controls the overall stereo balance between foreground and background.
Higher values:
- increase overall stereo strength
- make foreground and background separation stronger
- may increase eye strain
Lower values:
- create a gentler stereo effect
- improve comfort
- reduce extreme parallax
Recommended starting range:
0.70 to 1.00
For comfort, start around 0.70.
For stronger showcase depth, try 0.90 to 1.05.
Limits the maximum allowed parallax displacement.
This is a safety clamp.
Lower values:
- reduce extreme stereo separation
- improve comfort
- help prevent eye strain
- may reduce pop-out
Higher values:
- allow stronger depth
- allow more foreground pop
- can increase artifacts or discomfort
Recommended range:
0.020 to 0.050
For subtle or VR-friendly output, use lower values.
For stronger pop tests, temporarily try higher values such as 0.050, then reduce if the image becomes uncomfortable.
Controls overall stereo separation intensity, similar to virtual eye distance.
Higher values:
- stronger 3D effect
- larger parallax
- more separation between eyes
Lower values:
- more comfortable viewing
- softer depth
- less eye strain
Use this as a global depth strength control after your FG / MG / BG balance feels correct.
Enhances perceived edge clarity in the stereo output.
Higher values:
- make depth edges look crisper
- emphasize fine detail
- may also emphasize halos or edge artifacts
Lower values:
- produce softer transitions
- reduce harsh edge behavior
Use moderately.
Applies optional focus blur based on depth.
This can add cinematic realism, but should be used lightly.
Too much DOF can make the stereo output feel artificial or reduce depth readability.
The current VisionDepth3D Method does not rely only on direct shift amounts.
It uses a separate shaped depth representation for stereo design. This lets the renderer tune how near, mid, and far regions are emphasized without corrupting the underlying subject-tracking depth.
Depth shaping controls affect how the depth map is redistributed before the near / mid / far weighting system builds the final stereo shift field.
Controls how aggressively depth values are reshaped around the stereo midpoint.
Lower values:
- increase near/mid separation
- create stronger perceived pop
- can make scenes more dramatic
Higher values:
- soften the depth curve
- reduce cutout-like separation
- can make live or difficult scenes more natural
Recommended range:
0.75 to 1.15
For stronger pop, try 0.75 to 0.90.
For a more natural or less cardboard look, try 1.05 to 1.20.
Controls where the shaping curve focuses its strongest separation.
Lower values:
- emphasize foreground depth
- help subjects stand forward
- increase near-object separation
Higher values:
- shift emphasis toward midground and background
- help environments feel deeper
- reduce aggressive foreground pop
Recommended starting value:
0.45 to 0.50
Controls how the near-depth range is stretched.
Lower values:
- keep foreground tighter
- reduce over-expansion of near subjects
Higher values:
- expand foreground separation
- can make near objects feel stronger
Recommended starting range:
0.02 to 0.06
Controls how the far-depth range is stretched.
Lower values:
- keep background closer
- reduce background exaggeration
Higher values:
- push distant elements farther back
- increase scene scale
Recommended starting range:
0.94 to 0.98
Multiplies foreground depth strength after curve shaping.
Higher values:
- exaggerate subject separation
- increase near-object presence
- can create stronger pop-out
Lower values:
- keep subjects more natural
- reduce sticker-like foreground separation
Recommended range:
1.00 to 1.45
For aggressive testing, values around 1.50 to 1.60 may be useful, but should be reduced for final comfort.
Multiplies background depth recession after curve shaping.
Higher values:
- push environments deeper
- increase cinematic scale
Lower values:
- keep backgrounds closer
- reduce excessive depth spread
- help prevent cardboard separation
Recommended range:
0.85 to 1.15
Controls how strongly the pipeline anchors the detected subject depth.
Higher values:
- keep subjects stable
- reduce subject drift
- improve comfort
- may reduce pop-out if too strong
Lower values:
- allow more foreground movement
- can help pop-out testing
- may increase depth instability
Recommended range:
0.00 to 0.25
For strong pop-out testing, keep Subject Lock very low.
For long-form viewing, use light subject locking for stability.
Strong 3D does not come only from increasing shift values.
VisionDepth3D separates stereo design into multiple stages:
- normalized depth
- tracked subject depth
- shaped disparity depth
- near / mid / far weighting
- subject-aware zero parallax
- dynamic convergence
- edge-aware repair
- floating-window safety
Because of this, pop-out is controlled by more than Foreground Shift alone.
If foreground objects do not pop forward enough, check:
- Foreground Shift is negative enough
- Max Pixel Shift is not too low
- Parallax Balance is not too low
- Subject Lock is not anchoring the subject too strongly
- Dynamic Convergence is not pulling the scene back to the screen plane
- Floating Window is not limiting aggressive pop near frame edges
- Edge Masking is not suppressing too much shift around the foreground
- The depth map is not inverted
- The depth map has enough near-depth contrast
Use this only for testing, not as a final comfort preset:
Foreground Shift: -10.0
Midground Shift: -1.0
Background Shift: +2.5
Max Pixel Shift: 0.050
Parallax Balance: 1.00
Subject Lock: 0.00 to 0.05
Floating Window: Off for testing
Dynamic Convergence: Off for testing
Edge Masking: Off for testing
Feathering: Off for testing
Once the pop direction is confirmed, re-enable comfort and repair settings for final renders.
Adds cinematic edge protection to prevent objects from breaking the screen border.
This is useful when strong foreground pop approaches the left or right frame edge.
Benefits:
- reduces window violations
- improves viewing comfort
- protects aggressive stereo shots
- makes the render feel more professionally composed
For strong pop-out testing, disable Floating Window temporarily. For final renders, re-enable it if edge violations appear.
Automatically adjusts convergence based on the tracked subject path and smooths transitions over time.
When enabled:
- tracks subject depth more naturally
- smooths frame-to-frame convergence
- reduces sudden depth jumps
- improves comfort for full-length content
Recommended for long renders.
For aggressive pop-out testing, disable Dynamic Convergence temporarily to make sure it is not pulling the foreground back toward the screen plane.
Keeps the zero-parallax plane aligned with the dominant or tracked depth range.
When enabled:
- prevents depth drift
- keeps subjects more stable
- reduces eye strain during scene changes
This can improve comfort, but high subject locking can reduce strong pop-out.
Suppresses unstable stereo shift near hard depth edges such as:
- hair
- fingers
- shoulders
- thin foreground objects
- high-contrast silhouettes
Benefits:
- reduces halos
- reduces edge tearing
- improves contour cleanliness
If the scene lacks pop, test with Edge Masking off temporarily to see whether it is suppressing too much foreground shift. Re-enable it for final renders if edge artifacts appear.
Softens transitions between shifted regions.
Benefits:
- smoother depth blending
- fewer harsh stereo edges
- less cutout-like transitions
Too much feathering may make the scene feel softer or reduce perceived sharpness.
Disables temporal shift smoothing for debugging.
Use this only for testing.
When enabled:
- raw shift changes are easier to inspect
- pop direction can be tested more directly
- motion may look less stable
For final renders, shift smoothing is usually recommended.
Applies subtle focus blur based on depth.
Use sparingly.
This can help cinematic presentation, but too much blur can make depth harder to read.
Set start and end timecodes to render only a portion of the video.
Useful for:
- testing settings quickly
- tuning difficult scenes
- checking pop-out behavior
- checking edge artifacts
- avoiding long re-renders
Recommended before full-length renders.
Example:
Start: 00:01:20
End: 00:01:35
- Load source video and depth map video.
- Confirm the original resolution and aspect ratio.
- Choose output format and codec.
- Open preview sources.
- Test a few frames in Anaglyph and Shift Heatmap mode.
- Start with the new negative-foreground shift convention.
- Tune FG / MG / BG shift.
- Tune Max Pixel Shift and Parallax Balance.
- Adjust Depth Pop Gamma and FG Pop × if the scene feels flat.
- Render a short clip range.
- Re-enable comfort tools such as Dynamic Convergence, Edge Masking, Feathering, and Floating Window.
- Render the full video.
Foreground Shift: -6.0
Midground Shift: -0.8
Background Shift: +2.2
Max Pixel Shift: 0.022
Parallax Balance: 0.70
Depth Pop Gamma: 1.05
FG Pop ×: 1.00
BG Push ×: 0.95
Subject Lock: 0.15
Dynamic Convergence: On
Edge Masking: On
Feathering: On
Floating Window: On if needed
Best for:
- full-length movies
- dialogue scenes
- comfortable VR viewing
- natural depth layering
Foreground Shift: -8.5
Midground Shift: -1.2
Background Shift: +3.5
Max Pixel Shift: 0.035
Parallax Balance: 0.90
Depth Pop Gamma: 0.85
FG Pop ×: 1.20
BG Push ×: 1.05
Subject Lock: 0.10
Dynamic Convergence: On
Edge Masking: On
Feathering: On
Floating Window: On if needed
Best for:
- demo clips
- trailers
- scenes with clear subjects
- stronger depth presentation
Foreground Shift: -10.0 to -12.0
Midground Shift: -2.0
Background Shift: +4.0 to +5.0
Max Pixel Shift: 0.050
Parallax Balance: 1.00
Depth Pop Gamma: 0.75
FG Pop ×: 1.45
BG Push ×: 0.85
Subject Lock: 0.00 to 0.05
Dynamic Convergence: Off for testing
Edge Masking: Off for testing
Feathering: Off for testing
Floating Window: Off for testing
Best for:
- confirming pop direction
- diagnosing whether the foreground can move forward
- testing depth map strength
- checking if stabilization is suppressing pop
Not recommended as a final full-length preset without comfort adjustments.
Try checking:
- depth inversion
- eye order
- preview mode
- whether the depth map uses white-near or black-near convention
- whether old presets are being reused incorrectly
Check:
- Foreground Shift is negative enough
- Max Pixel Shift is high enough
- Parallax Balance is not too low
- Subject Lock is not too strong
- Dynamic Convergence is not over-stabilizing the subject
- Floating Window is not suppressing aggressive foreground depth
- the depth map has enough near-depth contrast
Try the Aggressive Pop-Out Test preset to confirm whether the renderer can produce forward disparity.
Try:
- reducing Foreground Shift strength
- moving Midground Shift closer to Foreground Shift
- increasing Depth Pop Gamma above
1.00 - lowering FG Pop ×
- lowering Subject Lock
- enabling Feathering
- using a smoother depth map
- blending depth maps in Depth Blender
Example adjustment:
From:
FG -12.0 / MG 0.0 / BG +5.0
To:
FG -6.0 / MG -0.8 / BG +2.2
Try:
- enabling Edge Masking
- enabling Feathering
- lowering Max Pixel Shift
- reducing Foreground Shift strength
- checking depth map edge quality
- using Depth Blender to smooth or refine the depth map
Try:
- reducing Max Pixel Shift
- reducing Parallax Balance
- moving Foreground Shift closer to zero
- lowering Background Shift
- enabling Dynamic Convergence
- enabling Floating Window
- using a shorter clip range for testing
Try:
- increasing Background Shift
- increasing BG Push ×
- lowering Pop Mid slightly
- increasing Stretch Hi
- increasing Parallax Balance carefully
Try:
- making Foreground Shift more negative
- increasing FG Pop ×
- lowering Depth Pop Gamma
- lowering Pop Mid slightly
- increasing Max Pixel Shift
- lowering Subject Lock
The current VisionDepth3D Method is designed around a full stereo pipeline rather than simple positive/negative pixel shifting.
Foreground, midground, and background controls now work together with:
- depth normalization
- pop-control depth shaping
- structured near / mid / far weighting
- subject-aware zero parallax
- dynamic convergence
- edge-aware shift limiting
- contour-safe repair
- floating-window control
- temporal stabilization
For the best results, start from the new presets, preview several frames, render short clip ranges, and tune gradually.
VD3D Live is a real-time 2D-to-3D pipeline designed for live sources like:
- Screen capture (desktop / games / video players)
- Cameras and capture cards
It captures frames, runs a Depth Anything model, then generates a stereoscopic SBS output using the Pixel Shift CUDA pipeline.
You can use it for:
- Live 3D preview while watching content
- Real-time depth tuning
- External output to other apps (HTTP stream or Virtual Camera)
Launch VD3D Live – GUI from inside VD3D (or run the live script if you use it standalone).
In the Capture section:
-
Source:
screen:1(primary monitor)screen:2for a second monitorscreen:0captures the full bounding box across all monitors (not recommended unless you need it)
-
Capture FPS:
30is a good default for stability- Raise if you want smoother motion and your GPU can keep up
Tip: If you are screen capturing the same monitor the preview is on, you can create a feedback loop. Use one of these:
- Put preview on a different monitor
- Enable Mask preview region in screen capture
- Or disable preview and use external output instead
In Depth / Model:
-
Model ID: choose a Depth model from hugging face or use one already in input field
Example:depth-anything/Depth-Anything-V2-Large-hf -
Use FP16 (if CUDA): enable this on NVIDIA GPUs
- Reduces VRAM usage and improves speed
-
Infer W / Infer H: depth inference resolution
Example:320 × 180for speed- Higher values = better depth detail, slower performance
-
Depth FPS: how often depth is updated
Example:5.0- Lower = faster overall performance
- Higher = more responsive depth changes
Optional:
- Smooth (EMA + median): reduces depth jitter and flicker
- EMA α: smoothing strength (higher = smoother but more lag)
In 3D / Pixel Shift:
- Enable Enable SBS 3D
- Set your shifts:
- FG shift (foreground pop)
- MG shift (mid depth layering)
- BG shift (background push)
Typical starter values:
- FG shift:
6 to 10 - MG shift:
1 to 3 - BG shift:
-3 to -6
These are live controls, so you can tune while watching.
In Preview / Output:
- Enable Show preview window if you want an on-screen preview
- If your source is screen capture:
- Leave Force preview (screen src) OFF unless you know what you are doing
- Use Mask preview region in screen capture if the preview is on the same monitor you’re capturing
Then press Start.
When the preview window is visible:
-
mcycles view mode
Passthrough → Depth → 3D-SBS -
ftoggles fullscreen -
qorESCquits
Controls the OpenCV capture backend:
msmfis usually best on Windowsdshowcan work better for some capture cardsffmpegcan help with certain device formats If a device won’t open or drops frames, try changing this first.
Selects which camera or capture device you are using.
If you have multiple devices, try index 0, then 1, then 2.
Requests a specific camera/capture format (example: YUY2).
Only use this if you know your device needs it.
Some capture devices output color channels differently.
- Force BGR swap manually flips channels (fixes weird colors)
- Disable auto swap prevents automatic guessing If your colors look wrong, toggle these.
Lets you view the live output in another app over your network.
-
Set HTTP stream (host:port)
Example:127.0.0.1:8080 -
Click Start
Then open:
http://127.0.0.1:8080/video.mjpg
Use this when:
- You want external viewing without a local preview window
- You want to capture the stream in another tool
Outputs the live SBS feed as a virtual webcam device (requires pyvirtualcam).
- Enable Virtual camera
- Set VCam FPS (example:
30) - Click Start
Use this when:
- You want to feed live SBS output into OBS, VR tools, or other software that accepts webcams
- You want an output pipeline without relying on the preview window
Note: The virtual camera resolution matches the current output frame size.
The Audio device field can start an audio monitor using ffplay.
- Audio device: your system audio capture name (Windows uses DirectShow naming)
- Audio delay ms: applies a delay if your video processing introduces lag
Use this when:
- You need audio while viewing live output
- You need to compensate for processing latency
Comfort + stability preset:
- Capture FPS:
30 - Infer:
320 × 180 - Depth FPS:
5 - Smooth: ON
- EMA α:
0.35 - FG/MG/BG:
8 / 2 / -4
If you need more depth detail:
- Raise Infer size first (example:
512 × 288) - Keep Depth FPS modest to avoid GPU overload
If you see repeated “screen within screen” or performance tanks:
- Disable preview and use HTTP stream / Virtual camera
- Or enable Mask preview region in screen capture
- Or move preview to a different monitor than the one being captured
- For screen capture: make sure
mssis installed - For device capture: try a different backend (
msmf↔dshow) and check device index
- Lower Infer resolution
- Lower Depth FPS
- Turn Smooth OFF
- Reduce shift strength slightly
- Make sure FP16 is enabled on CUDA
For best quality and efficiency, follow this proven VD3D workflow:
- Generate depth maps in the Depth Estimation Tab
- (Optional) Blend two depth sources in the Depth Blender Tab
- Load source + depth video in the 3D Generator Tab
- Configure Encoder Settings and Processing Options
- Open Live Preview and tune depth using Shift Heatmap + Anaglyph
- Test short Clip Range (optional)
- Render final full-length 3D video
This approach prevents wasted long renders and ensures optimal depth quality.
- Start with built-in presets and refine from there
- Use Shift Heatmap view to keep parallax within comfortable ranges
- Increase depth gradually rather than maxing sliders
- Enable Dynamic Convergence for long content
- Use Edge Masking + Feathering for clean depth edges
- Test short clip ranges before full renders
- Avoid extreme pixel shift values (eye strain risk)
Balanced depth always looks more cinematic than aggressive depth.
VisionDepth3D is designed for GPU acceleration, but supported features depend on your hardware and installed backend.
NVIDIA CUDA is the recommended setup for VisionDepth3D.
Best for:
- Depth estimation
- 3D stereo rendering
- Live 3D preview
- RIFE interpolation
- Real-ESRGAN upscaling
- NVENC video encoding
Recommended install path:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128Use the official PyTorch install selector if your system needs a different CUDA version:
https://pytorch.org/get-started/locally/
AMD and Intel GPU users on Windows can use DirectML through:
pip install torch-directmlDirectML can provide GPU acceleration on supported AMD Radeon, Intel Arc, and some integrated GPUs.
Notes:
- DirectML is usually slower than NVIDIA CUDA.
- Some models or operations may fall back to CPU.
- If DirectML causes issues, use CPU mode as a fallback.
- Do not install CUDA PyTorch for AMD GPUs on Windows.
AMD users on Linux may be able to use ROCm if their GPU and driver stack are supported.
ROCm support depends heavily on:
- GPU model
- Linux distribution
- installed ROCm version
- PyTorch ROCm compatibility
Use the official PyTorch install selector for ROCm setup:
https://pytorch.org/get-started/locally/
VisionDepth3D can fall back to CPU when no supported GPU backend is available.
CPU mode works, but it is much slower for:
- Depth map generation
- Video processing
- Live 3D
- Upscaling
- Frame interpolation
- Full 3D renders
CPU mode is best for testing, small images, or fallback compatibility.
VisionDepth3D can use different FFmpeg encoders depending on your GPU.
| GPU / Backend | Encoder Options |
|---|---|
| NVIDIA | h264_nvenc, hevc_nvenc, av1_nvenc |
| AMD | h264_amf, hevc_amf, av1_amf |
| Intel | h264_qsv, hevc_qsv, av1_qsv |
| CPU | libx264, libx265, libaom-av1, libsvtav1 |
If a hardware encoder fails, try a CPU encoder for compatibility.
| User Type | Recommended Backend |
|---|---|
| NVIDIA GPU user | CUDA PyTorch + NVENC |
| AMD GPU on Windows | DirectML + AMF encoder |
| Intel GPU on Windows | DirectML + QSV encoder |
| AMD GPU on Linux | ROCm if supported |
| No supported GPU | CPU PyTorch |
Try:
- Check NVIDIA driver installation.
- Run
nvidia-smi. - Reinstall CUDA PyTorch using the official PyTorch selector.
- Make sure
torch,torchvision, andtorchaudiouse matching CUDA builds. - Restart VisionDepth3D after reinstalling PyTorch.
Try:
- Install
torch-directml. - Update AMD / Intel GPU drivers.
- Confirm you are on Windows.
- Restart VisionDepth3D after installation.
- Use CPU mode if DirectML is unstable.
Try:
- Confirm your AMD GPU supports ROCm.
- Confirm your Linux distribution is supported by ROCm.
- Install the correct PyTorch ROCm build.
- Check that your ROCm driver/runtime is installed correctly.
Try:
- Switch from NVENC / AMF / QSV to CPU encoding.
- Update GPU drivers.
- Use H.264 before trying H.265 or AV1.
- Confirm your GPU supports the selected encoder.
- Try another container such as
.mkvif.mp4fails.
The VD3D should follow the new v4.0 shift convention.
Older 3D values may have used:
FG shift: 6 to 10
MG shift: 1 to 3
BG shift: -3 to -6
For VisionDepth3D v4.0, use the new convention:
FG shift: -5 to -10
MG shift: -0.5 to -2
BG shift: +2 to +5
Recommended Live 3D starter preset:
FG/MG/BG: -6 / -0.8 / +2.2
Stronger Live 3D test preset:
FG/MG/BG: -8.5 / -1.2 / +3.5
If 3D looks inverted, check:
- depth inversion
- eye order
- foreground shift direction
- whether an older preset was loaded
- whether the depth model uses the opposite near/far convention
For comfortable realtime Live 3D, start with softer values and increase strength slowly.
Recommended comfort settings:
Capture FPS: 30
Inference: 384x384 or 518x518
Depth FPS: 4 to 6
Smooth Depth: On
Foreground Shift: -6.0
Midground Shift: -0.8
Background Shift: +2.2
Max Pixel Shift: 0.020 to 0.030
Parallax Balance: 0.70
Depth Pop Gamma: 1.05 to 1.15
Subject Tracking: Off for testing, On for stability
Dynamic Convergence: On
Edge Masking: On
Feathering: Off for speed, On for cleaner edges
Floating Window: Off for testing, On if edge violations appear
- Start at 512×288 or 704×384 for movies
- Increase only if depth lacks detail
- Raise batch size until VRAM limit is reached
- Use Threaded Pipeline on strong GPUs
- Use Merged Pipeline for long videos or lower-end systems
- NVENC encoding is much faster on NVIDIA GPUs
- Moderate Max Pixel Shift improves comfort and speed
- Avoid excessive feather + masking strength
- Increase Depth Pop Gamma
- Raise Foreground Shift slightly
- Adjust Pop Mid toward subject depth
- Enable Edge-Aware Masking
- Increase or Decrease MG Shift to eliminate Edge Tearing
- Reduce Sharpness Factor
- Enable Dynamic Convergence
- Enable Stabilize Zero-Parallax
- Reduce Convergence Strength
- Lower Max Pixel Shift
- Reduce Foreground Shift
- Decrease Stereo Scaling (IPD)
- Lower inference resolution
- Reduce batch size
- Use NVENC encoder
- Disable unnecessary preview modes
Use the Depth Blender when:
- Subject edges shimmer or break
- Background depth is noisy
- One model looks strong in subjects but weak in environment
Blending V1 + V2 depth sources often produces the cleanest results.
For updates, documentation, and new releases:
- GitHub repository (VisionDepth3D)
- Community feedback and issues welcome
Regular updates continue improving depth quality, speed, and stability.
End of User Manual