Hardware Acceleration#
Intel® VPL provides a new model for working with hardware acceleration while continuing to support hardware acceleration in legacy mode.
New Model to Work with Hardware Acceleration#
Intel® VPL API version 2.0 introduces a new memory model: internal allocation where
Intel® VPL is responsible for video memory allocation. In this mode,
an application is not dependent on a low-level video framework API, such as
DirectX* or the VA-API, and does not need to create and set corresponding
low-level Intel® VPL primitives such as ID3D11Device or VADisplay. Instead,
Intel® VPL creates all required objects to work with hardware acceleration and video
surfaces internally. An application can get access to these objects using
MFXVideoCORE_GetHandle()
or with help of the
mfxFrameSurfaceInterface
interface.
This approach simplifies the Intel® VPL initialization, making calls to the
MFXVideoENCODE_QueryIOSurf()
, MFXVideoDECODE_QueryIOSurf()
,
or MFXVideoVPP_QueryIOSurf()
functions optional. See
Internal Memory Management.
Note
Applications can set device handle before session creation through MFXSetConfigFilterProperty()
like shown in the code below:
1mfxLoader loader = MFXLoad();
2mfxConfig config1 = MFXCreateConfig(loader);
3mfxConfig config2 = MFXCreateConfig(loader);
4mfxSession session;
5
6mfxVariant HandleType;
7HandleType.Type = MFX_VARIANT_TYPE_U32;
8HandleType.Data.U32 = MFX_HANDLE_VA_DISPLAY;
9MFXSetConfigFilterProperty(config1, (mfxU8*)"mfxHandleType", HandleType);
10
11mfxVariant DisplayHandle;
12DisplayHandle.Type = MFX_VARIANT_TYPE_PTR;
13HandleType.Data.Ptr = vaDisplay;
14MFXSetConfigFilterProperty(config2, (mfxU8*)"mfxHDL", DisplayHandle);
15
16MFXCreateSession(loader, 0, &session);
Work with Hardware Acceleration in Legacy Mode#
Work with Multiple Media Devices#
If your system has multiple graphics adapters, you may need hints on which adapter is better suited to process a particular workload. The legacy mode of Intel® VPL provides a helper API to select the most suitable adapter for your workload based on the provided workload description.
Important
MFXQueryAdapters()
, MFXQueryAdaptersDecode()
,
and MFXQueryAdaptersNumber()
are deprecated starting
from API 2.9. Applications should use MFXEnumImplementations()
and MFXSetConfigFilterProperty()
to query adapter
capabilities and to select a suitable adapter for the input workload.
The following example shows workload initialization on a discrete adapter in legacy mode:
1mfxU32 num_adapters_available;
2mfxIMPL impl;
3
4// Query number of graphics adapters available on system
5mfxStatus sts = MFXQueryAdaptersNumber(&num_adapters_available);
6MSDK_CHECK_STATUS(sts, "MFXQueryAdaptersNumber failed");
7
8// Allocate memory for response
9std::vector<mfxAdapterInfo> displays_data(num_adapters_available);
10mfxAdaptersInfo adapters = { displays_data.data(), mfxU32(displays_data.size()), 0u, {0} };
11
12// Query information about all adapters (mind that first parameter is NULL)
13sts = MFXQueryAdapters(nullptr, &adapters);
14MSDK_CHECK_STATUS(sts, "MFXQueryAdapters failed");
15
16// Find dGfx adapter in list of adapters
17auto idx_d = std::find_if(adapters.Adapters, adapters.Adapters + adapters.NumActual,
18 [](const mfxAdapterInfo info)
19{
20 return info.Platform.MediaAdapterType == mfxMediaAdapterType::MFX_MEDIA_DISCRETE;
21});
22
23// No dGfx in list
24if (idx_d == adapters.Adapters + adapters.NumActual)
25{
26 printf("Warning: No dGfx detected on machine\n");
27 return -1;
28}
29
30mfxU32 idx = static_cast<mfxU32>(std::distance(adapters.Adapters, idx_d));
31
32// Choose correct implementation for discrete adapter
33switch (adapters.Adapters[idx].Number)
34{
35case 0:
36 impl = MFX_IMPL_HARDWARE;
37 break;
38case 1:
39 impl = MFX_IMPL_HARDWARE2;
40 break;
41case 2:
42 impl = MFX_IMPL_HARDWARE3;
43 break;
44case 3:
45 impl = MFX_IMPL_HARDWARE4;
46 break;
47
48default:
49 // Try searching on all display adapters
50 impl = MFX_IMPL_HARDWARE_ANY;
51 break;
52}
53printf("Choosen implementation: %d\n", impl);
54// Initialize mfxSession in regular way with obtained implementation.
The example shows that after obtaining the adapter list with
MFXQueryAdapters()
, further initialization of mfxSession
is performed in the regular way. The specific adapter is selected using
the MFX_IMPL_HARDWARE
, MFX_IMPL_HARDWARE2
,
MFX_IMPL_HARDWARE3
, or MFX_IMPL_HARDWARE4
values of mfxIMPL
.
The following example shows the use of MFXQueryAdapters()
for querying
the most suitable adapter for a particular encode workload:
1mfxU32 num_adapters_available;
2mfxIMPL impl;
3mfxVideoParam Encode_mfxVideoParam;
4
5// Query number of graphics adapters available on system
6mfxStatus sts = MFXQueryAdaptersNumber(&num_adapters_available);
7MSDK_CHECK_STATUS(sts, "MFXQueryAdaptersNumber failed");
8
9// Allocate memory for response
10std::vector<mfxAdapterInfo> displays_data(num_adapters_available);
11mfxAdaptersInfo adapters = { displays_data.data(), mfxU32(displays_data.size()), 0u, {0} };
12
13// Fill description of Encode workload
14mfxComponentInfo interface_request = { MFX_COMPONENT_ENCODE, Encode_mfxVideoParam, {0} };
15
16// Query information about suitable adapters for Encode workload described by Encode_mfxVideoParam
17sts = MFXQueryAdapters(&interface_request, &adapters);
18
19if (sts == MFX_ERR_NOT_FOUND)
20{
21 printf("Error: No adapters on machine capable to process desired workload\n");
22 return -1;
23}
24
25MSDK_CHECK_STATUS(sts, "MFXQueryAdapters failed");
26
27// Choose correct implementation for discrete adapter. Mind usage of index 0, this is best suitable adapter from MSDK perspective
28switch (adapters.Adapters[0].Number)
29{
30case 0:
31 impl = MFX_IMPL_HARDWARE;
32 break;
33case 1:
34 impl = MFX_IMPL_HARDWARE2;
35 break;
36case 2:
37 impl = MFX_IMPL_HARDWARE3;
38 break;
39case 3:
40 impl = MFX_IMPL_HARDWARE4;
41 break;
42
43default:
44 // Try searching on all display adapters
45 impl = MFX_IMPL_HARDWARE_ANY;
46 break;
47}
48
49printf("Choosen implementation: %d\n", impl);
50
51// Initialize mfxSession in regular way with obtained implementation
See the MFXQueryAdapters()
description for adapter priority rules.
Work with Video Memory#
To fully utilize the Intel® VPL acceleration capability, the application should support OS specific infrastructures. If using Microsoft* Windows*, the application should support Microsoft DirectX*. If using Linux*, the application should support the VA-API for Linux.
The hardware acceleration support in an application consists of video memory support and acceleration device support.
Depending on the usage model, the application can use video memory at different stages in the pipeline. Three major scenarios are shown in the following diagrams:
The application must use the mfxVideoParam::IOPattern
field to
indicate the I/O access pattern during initialization. Subsequent function calls
must follow this access pattern. For example, if a function operates on video
memory surfaces at both input and output, the
application must specify the access pattern IOPattern at initialization in
MFX_IOPATTERN_IN_VIDEO_MEMORY
for input and
MFX_IOPATTERN_OUT_VIDEO_MEMORY
for output. This particular I/O
access pattern must not change inside the Init - Close sequence.
Initialization of any hardware accelerated Intel® VPL component requires the
acceleration device handle. This handle is also used by the Intel® VPL component to
query hardware capabilities. The application can share its device with Intel® VPL
by passing the device handle through the MFXVideoCORE_SetHandle()
function. It is recommended to share the handle before any actual usage of Intel® VPL.
Work with Microsoft DirectX* Applications#
Intel® VPL supports two different infrastructures for hardware acceleration on the Microsoft Windows OS: the Direct3D* 9 DXVA2 and Direct3D 11 Video API. If Direct3D 9 DXVA2 is used for hardware acceleration, the application should use the IDirect3DDeviceManager9 interface as the acceleration device handle. If the Direct3D 11 Video API is used for hardware acceleration, the application should use the ID3D11Device interface as the acceleration device handle.
The application should share one of these interfaces with Intel® VPL through the
MFXVideoCORE_SetHandle()
function. If the application does not provide
the interface, then Intel® VPL creates its own internal acceleration device. As a result,
Intel® VPL input and output will be limited to system memory only for the external allocation mode, which will reduce Intel® VPL performance. If Intel® VPL fails to create a valid acceleration device,
then Intel® VPL cannot proceed with hardware acceleration and returns an error status to the
application.
Note
It is recommended to work in the internal allocation mode if the application does not provide the IDirect3DDeviceManager9 or ID3D11Device interface.
The application must create the Direct3D 9 device with the flag
D3DCREATE_MULTITHREADED
. The flag D3DCREATE_FPU_PRESERVE
is also
recommended. This influences floating-point calculations, including PTS values.
The application must also set multi-threading mode for the Direct3D 11 device. The following example shows how to set multi-threading mode for a Direct3D 11 device:
1ID3D11Device *pD11Device;
2ID3D11DeviceContext *pD11Context;
3ID3D10Multithread *pD10Multithread;
4
5pD11Device->GetImmediateContext(&pD11Context);
6pD11Context->QueryInterface(IID_ID3D10Multithread, &pD10Multithread);
7pD10Multithread->SetMultithreadProtected(true);
During hardware acceleration, if a Direct3D “device lost” event occurs, the Intel® VPL
operation terminates with the mfxStatus::MFX_ERR_DEVICE_LOST
return status. If the application provided the Direct3D device handle, the
application must reset the Direct3D device.
When the Intel® VPL decoder creates auxiliary devices for hardware acceleration, it must allocate the list of Direct3D surfaces for I/O access, also known as the surface chain, and pass the surface chain as part of the device creation command. In most cases, the surface chain is the frame surface pool mentioned in the Frame Surface Locking section.
The application passes the surface chain to the Intel® VPL component Init function through a Intel® VPL external allocator callback. See the Memory Allocation and External Allocators section for details.
Only the decoder Init function requests the external surface chain from the application and uses it for auxiliary device creation. Encoder and VPP Init functions may only request internal surfaces. See the ExtMemFrameType enumerator for more details about different memory types.
Depending on configuration parameters, Intel® VPL requires different surface types.
It is strongly recommended to call the MFXVideoENCODE_QueryIOSurf()
function, the MFXVideoDECODE_QueryIOSurf()
function, or the
MFXVideoVPP_QueryIOSurf()
function to determine the appropriate type in
the external allocation mode.
Work with VA-API Applications#
Intel® VPL supports the VA-API infrastructure for hardware acceleration on Linux.
The application should use the VADisplay interface as the acceleration device
handle for this infrastructure and share it with Intel® VPL through the
MFXVideoCORE_SetHandle()
function.
The following example shows how to obtain the VA display from the X Window System:
1Display *x11_display;
2VADisplay va_display;
3
4x11_display = XOpenDisplay(current_display);
5va_display = vaGetDisplay(x11_display);
6
7MFXVideoCORE_SetHandle(session, MFX_HANDLE_VA_DISPLAY, (mfxHDL) va_display);
The following example shows how to obtain the VA display from the Direct Rendering Manager:
1int card;
2VADisplay va_display;
3
4card = open("/dev/dri/card0", O_RDWR); /* primary card */
5va_display = vaGetDisplayDRM(card);
6vaInitialize(va_display, &major_version, &minor_version);
7
8MFXVideoCORE_SetHandle(session, MFX_HANDLE_VA_DISPLAY, (mfxHDL) va_display);
When the Intel® VPL decoder creates a hardware acceleration device, it must allocate the list of video memory surfaces for I/O access, also known as the surface chain, and pass the surface chain as part of the device creation command. The application passes the surface chain to the Intel® VPL component Init function through a Intel® VPL external allocator callback. See the Memory Allocation and External Allocators section for details. Starting from Intel® VPL API version 2.0, Intel® VPL creates its own surface chain if an external allocator is not set. See the :ref`New Model to work with Hardware Acceleration <hw-acceleration>` section for details.
Note
The VA-API does not define any surface types and the application can
use either MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET
or MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET
to indicate data in video
memory.