torchcodec-xpu: add XPU encoding support#58
Conversation
| // Layout A: 1 layer, 2 planes — layers[0].planes[0]=Y, layers[0].planes[1]=UV | ||
| // Layout B: 2 layers, 1 plane each — layers[0].planes[0]=Y, layers[1].planes[0]=UV | ||
| const bool layoutA = (desc.num_layers == 1 && desc.layers[0].num_planes == 2); | ||
| const bool layoutB = (desc.num_layers == 2 && desc.layers[0].num_planes == 1 |
There was a problem hiding this comment.
well. Yes, except that we don't have any other driver which has another layout... I am not sure that we should implement something which we never tested.
There was a problem hiding this comment.
Empirically the iHD driver on Battlemage (BMG) returns Layout B (num_layers=2, one plane per layer, DRM_FORMAT_R8 + DRM_FORMAT_GR88); removing that branch makes encoding fail at runtime with Unsupported NV12 export layout: num_layers=2 layers[0].num_planes=1.
| ++j) { | ||
| if (config->device_type == AV_HWDEVICE_TYPE_VAAPI) { | ||
| return codec; | ||
| // 2) Encoder-only fallback: if no VAAPI encoder exists for codecId |
There was a problem hiding this comment.
The NVidia counterpart does not have that. Why?
There was a problem hiding this comment.
The [if (!isDecoder)] block is an XPU/VAAPI-specific encoder-codec fallback. FFmpeg does not register a VAAPI encoder for every [AVCodecID]
CUDA does not need this because the calling layer already resolves the codec id to one with an AV_HWDEVICE_TYPE_CUDA hw_config (NVENC ids) before findCodec runs. Removing this block makes default-codec mp4 encoding on XPU fall back to software MPEG-4 or fail at avcodec_open2.
When a user calls encode_video("out.mp4"... ) without specifying [codec], [findCodec] is invoked with AV_CODEC_ID_MPEG4 and no VAAPI codec advertises an [AV_HWDEVICE_TYPE_VAAPI] hw_config for it, so it substitutes the first HW-capable alternative (h264_vaapi / hevc_vaapi / av1_vaapi) andthe mp4 container accepts the substitution.
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
How to reproduce FFmpeg RGB->YUV matrix values
1. Expose ff_fill_rgb2yuv_table in libavfilter/libavfilter.v:
add "ff_fill_rgb2yuv_table;" under the global section.
Example:
libavfilter/libavfilter.v
LIBAVFILTER_MAJOR {
global:
avfilter_*;
av_*;
+ ff_fill_rgb2yuv_table;
local:
*;
};
2. Rebuild FFmpeg:
cd ffmpeg && ./configure && make -j$(nproc) && make install
nm -D <prefix>/lib/libavfilter.so | grep ff_fill_rgb2yuv_table
3. Create rgb2yuv_test.c calling
ff_fill_rgb2yuv_table(av_csp_luma_coeffs_from_avcsp(cs), m)
for AVCOL_SPC_BT709, BT470BG.
4. Build:
gcc rgb2yuv_test.c -o rgb2yuv_test \
-I<prefix>/include -L<prefix>/lib \
-lavfilter -lavutil -Wl,-rpath,<prefix>/lib
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com> Co-authored By: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
3824cf5 to
bec35e2
Compare
Signed-off-by: Edgar Romo Montiel <edgar.romo.montiel@intel.com>
|
@eromomon , please, share the test script which you are using to verify encoding support here in the comment. |
| } | ||
|
|
||
| - if (deviceType == kStableCUDA) { | ||
| + if (deviceType == kStableCUDA || deviceType == kStableXPU) { |
There was a problem hiding this comment.
Similar to decoding [1] these calls should not be protected by any deviceTYpe checks. Across the 2 calls below:
registerHardwareDeviceWithCodec()can already be called without protection as it's actually an empty call by default [2]setupHardwareFrameContextForEncoding()needs to be patched and error condition just dropped [3]
[1] https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/SingleStreamDecoder.cpp#L517
[2] https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/DeviceInterface.h#L88
[3] https://github.com/meta-pytorch/torchcodec/blob/8bbce656797c4f2b00feb2784ffe76e408be1e4c/src/torchcodec/_core/DeviceInterface.h#L173
|
|
||
| if (videoStream.options.pixelFormat.has_value()) { | ||
| - if (deviceType == kStableCUDA) { | ||
| + if (deviceType == kStableCUDA || deviceType == kStableXPU) { |
There was a problem hiding this comment.
This check does not actually make sense here. What's going on is that Encoder.cpp sets the output pixel format and passes it to the FFmpeg context:
videoStream.avCodecContext->pix_fmt = outPixelFormat;
Then few lines below setupHardwareFrameContextForEncoding() is called which overrides whatever Encoder.cpp has just set [1]. That's wrong. I think the setup...() call should instead verify what user has passed and error out if encoder can't work with the given output pixel format. I.e. the check needs to be moved here. I.e. something like:
void CudaDeviceInterface::setupHardwareFrameContextForEncoding(
AVCodecContext* codecContext) {
...
STD_TORCH_CHECK(
codecContext->pix_fmt == DeviceInterface::CUDA_ENCODING_PIXEL_FORMAT,
"Video encoding on GPU currently only supports the nv12 pixel format. "
"Do not set pixel_format to use nv12 by default.");
...
}
| validatePixelFormat(*avCodec, videoStream.options.pixelFormat.value()); | ||
| } else { | ||
| - if (deviceType == kStableCUDA) { | ||
| + if (deviceType == kStableCUDA || deviceType == kStableXPU) { |
There was a problem hiding this comment.
Well, we either need a new method in device interface to get default encoding pixel format or use device agnostic variant. To be honest I would introduce new API to device interface:
class DeviceInterface {
public:
AVPixelFormat getDefaultPixelFormat() { return AV_PIX_FMT_NV12; }
};
Extends VideoEncoder to support Intel XPU devices.
Encoder.cpp: extended kStableCUDA device-type checks to also match kStableXPU in both VideoEncoder and MultiStreamEncoder, enabling the hardware encoding path (hw frame context setup, pixel format selection, device registration).
XpuDeviceInterface: implemented setupHardwareFrameContextForEncoding and convertTensorToAVFrameForEncoding. RGB→NV12 conversion is done via a SYCL kernel, or via libswscale as CPU fallback.