Skip to content

torchcodec-xpu: add XPU encoding support#58

Open
eromomon wants to merge 7 commits into
intel:mainfrom
eromomon:eromomon/encoding
Open

torchcodec-xpu: add XPU encoding support#58
eromomon wants to merge 7 commits into
intel:mainfrom
eromomon:eromomon/encoding

Conversation

@eromomon

Copy link
Copy Markdown
Contributor

Extends VideoEncoder to support Intel XPU devices.

Encoder.cpp: extended kStableCUDA device-type checks to also match kStableXPU in both VideoEncoder and MultiStreamEncoder, enabling the hardware encoding path (hw frame context setup, pixel format selection, device registration).
XpuDeviceInterface: implemented setupHardwareFrameContextForEncoding and convertTensorToAVFrameForEncoding. RGB→NV12 conversion is done via a SYCL kernel, or via libswscale as CPU fallback.

@eromomon eromomon requested review from dvrogozh and removed request for dvrogozh May 25, 2026 23:07
@dvrogozh dvrogozh changed the title Add XPU encoding support to Encoder torchcodec-xpu: add XPU encoding support May 26, 2026
Comment thread packages/torchcodec-xpu/src/torchcodec_xpu/XpuDeviceInterface.cpp Outdated
Comment thread packages/torchcodec-xpu/src/torchcodec_xpu/XpuDeviceInterface.cpp Outdated
Comment thread packages/torchcodec-xpu/src/torchcodec_xpu/XpuDeviceInterface.cpp Outdated
Comment thread packages/torchcodec-xpu/src/torchcodec_xpu/XpuDeviceInterface.cpp
// Layout A: 1 layer, 2 planes — layers[0].planes[0]=Y, layers[0].planes[1]=UV
// Layout B: 2 layers, 1 plane each — layers[0].planes[0]=Y, layers[1].planes[0]=UV
const bool layoutA = (desc.num_layers == 1 && desc.layers[0].num_planes == 2);
const bool layoutB = (desc.num_layers == 2 && desc.layers[0].num_planes == 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well. Yes, except that we don't have any other driver which has another layout... I am not sure that we should implement something which we never tested.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically the iHD driver on Battlemage (BMG) returns Layout B (num_layers=2, one plane per layer, DRM_FORMAT_R8 + DRM_FORMAT_GR88); removing that branch makes encoding fail at runtime with Unsupported NV12 export layout: num_layers=2 layers[0].num_planes=1.

Comment thread packages/torchcodec-xpu/src/torchcodec_xpu/XpuDeviceInterface.h

@dvrogozh dvrogozh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase, please.

++j) {
if (config->device_type == AV_HWDEVICE_TYPE_VAAPI) {
return codec;
// 2) Encoder-only fallback: if no VAAPI encoder exists for codecId

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eromomon : please, reply.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [if (!isDecoder)] block is an XPU/VAAPI-specific encoder-codec fallback. FFmpeg does not register a VAAPI encoder for every [AVCodecID]

CUDA does not need this because the calling layer already resolves the codec id to one with an AV_HWDEVICE_TYPE_CUDA hw_config (NVENC ids) before findCodec runs. Removing this block makes default-codec mp4 encoding on XPU fall back to software MPEG-4 or fail at avcodec_open2.

When a user calls encode_video("out.mp4"... ) without specifying [codec], [findCodec] is invoked with AV_CODEC_ID_MPEG4 and no VAAPI codec advertises an [AV_HWDEVICE_TYPE_VAAPI] hw_config for it, so it substitutes the first HW-capable alternative (h264_vaapi / hevc_vaapi / av1_vaapi) andthe mp4 container accepts the substitution.

Comment thread packages/torchcodec-xpu/src/torchcodec_xpu/XpuDeviceInterface.h Outdated
eromomon and others added 6 commits June 15, 2026 14:49
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
How to reproduce FFmpeg RGB->YUV matrix values

1. Expose ff_fill_rgb2yuv_table in libavfilter/libavfilter.v:
   add "ff_fill_rgb2yuv_table;" under the global section.
   Example:
	libavfilter/libavfilter.v
	 LIBAVFILTER_MAJOR {
	     global:
	         avfilter_*;
	         av_*;
	+        ff_fill_rgb2yuv_table;
	     local:
	         *;
	 };

2. Rebuild FFmpeg:
   cd ffmpeg && ./configure && make -j$(nproc) && make install
   nm -D <prefix>/lib/libavfilter.so | grep ff_fill_rgb2yuv_table

3. Create rgb2yuv_test.c calling
   ff_fill_rgb2yuv_table(av_csp_luma_coeffs_from_avcsp(cs), m)
   for AVCOL_SPC_BT709, BT470BG.

4. Build:
   gcc rgb2yuv_test.c -o rgb2yuv_test \
       -I<prefix>/include -L<prefix>/lib \
       -lavfilter -lavutil -Wl,-rpath,<prefix>/lib

Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
Co-authored By: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@eromomon eromomon force-pushed the eromomon/encoding branch from 3824cf5 to bec35e2 Compare June 15, 2026 21:59
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
@dvrogozh

Copy link
Copy Markdown
Contributor

@eromomon , please, share the test script which you are using to verify encoding support here in the comment.

}

- if (deviceType == kStableCUDA) {
+ if (deviceType == kStableCUDA || deviceType == kStableXPU) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to decoding [1] these calls should not be protected by any deviceTYpe checks. Across the 2 calls below:

  • registerHardwareDeviceWithCodec() can already be called without protection as it's actually an empty call by default [2]
  • setupHardwareFrameContextForEncoding() needs to be patched and error condition just dropped [3]

[1] https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/SingleStreamDecoder.cpp#L517
[2] https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/DeviceInterface.h#L88
[3] https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/DeviceInterface.h#L173


if (videoStream.options.pixelFormat.has_value()) {
- if (deviceType == kStableCUDA) {
+ if (deviceType == kStableCUDA || deviceType == kStableXPU) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check does not actually make sense here. What's going on is that Encoder.cpp sets the output pixel format and passes it to the FFmpeg context:

videoStream.avCodecContext->pix_fmt = outPixelFormat;

Then few lines below setupHardwareFrameContextForEncoding() is called which overrides whatever Encoder.cpp has just set [1]. That's wrong. I think the setup...() call should instead verify what user has passed and error out if encoder can't work with the given output pixel format. I.e. the check needs to be moved here. I.e. something like:

void CudaDeviceInterface::setupHardwareFrameContextForEncoding(
    AVCodecContext* codecContext) {
...
      STD_TORCH_CHECK(
          codecContext->pix_fmt == DeviceInterface::CUDA_ENCODING_PIXEL_FORMAT,
          "Video encoding on GPU currently only supports the nv12 pixel format. "
          "Do not set pixel_format to use nv12 by default.");
...
}

[1] https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/CudaDeviceInterface.cpp#L482

validatePixelFormat(*avCodec, videoStream.options.pixelFormat.value());
} else {
- if (deviceType == kStableCUDA) {
+ if (deviceType == kStableCUDA || deviceType == kStableXPU) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we either need a new method in device interface to get default encoding pixel format or use device agnostic variant. To be honest I would introduce new API to device interface:

class DeviceInterface {
 public:
    AVPixelFormat getDefaultPixelFormat() { return AV_PIX_FMT_NV12; }
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants