Decoding Procedures#
There are several approaches to decode video frames. The first one is based on the internal allocation mechanism presented here:
1MFXVideoDECODE_Init(session, &init_param);
2sts=MFX_ERR_MORE_DATA;
3for (;;) {
4 if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
5 append_more_bitstream(bitstream);
6 bits=(end_of_stream())?NULL:bitstream;
7 sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,NULL,&disp,&syncp);
8 if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
9 // skipped other error handling
10 if (sts==MFX_ERR_NONE) {
11 disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
12 do_something_with_decoded_frame(disp);
13 disp->FrameInterface->Release(disp);
14 }
15}
16MFXVideoDECODE_Close(session);
Note the following key points about the example:
The application calls the
MFXVideoDECODE_DecodeFrameAsync()
function for a decoding operation with the bitstream buffer (bits), frame surface is allocated internally by the library.Attention
As shown in the example above starting with API version 2.0, the application can provide NULL as the working frame surface that leads to internal memory allocation.
If decoding output is not available, the function returns a status code requesting additional bitstream input as follows:
mfxStatus::MFX_ERR_MORE_DATA
: The function needs additional bitstream input. The existing buffer contains less than a frame’s worth of bitstream data.
Upon successful decoding, the
MFXVideoDECODE_DecodeFrameAsync()
function returnsmfxStatus::MFX_ERR_NONE
. However, the decoded frame data (identified by the surface_out pointer) is not yet available because theMFXVideoDECODE_DecodeFrameAsync()
function is asynchronous. The application must use theMFXVideoCORE_SyncOperation()
ormfxFrameSurfaceInterface::Synchronize
to synchronize the decoding operation before retrieving the decoded frame data.At the end of the bitstream, the application continuously calls the
MFXVideoDECODE_DecodeFrameAsync()
function with a NULL bitstream pointer to drain any remaining frames cached within the decoder until the function returnsmfxStatus::MFX_ERR_MORE_DATA
.When application completes the work with frame surface, it must call release to avoid memory leaks.
The next example demonstrates how applications can use internally pre-allocated chunk of video surfaces:
1MFXVideoDECODE_QueryIOSurf(session, &init_param, &request);
2MFXVideoDECODE_Init(session, &init_param);
3for (int i = 0; i < request.NumFrameSuggested; i++) {
4 MFXMemory_GetSurfaceForDecode(session, &work);
5 add_surface_to_pool(work);
6}
7sts=MFX_ERR_MORE_DATA;
8for (;;) {
9 if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
10 append_more_bitstream(bitstream);
11 bits=(end_of_stream())?NULL:bitstream;
12 // application logic to distinguish free and busy surfaces
13 find_free_surface_from_the_pool(&work);
14 sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,work,&disp,&syncp);
15 if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
16 // skipped other error handling
17 if (sts==MFX_ERR_NONE) {
18 disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
19 do_something_with_decoded_frame(disp);
20 disp->FrameInterface->Release(disp);
21 }
22}
23for (int i = 0; i < request.NumFrameSuggested; i++) {
24 get_next_surface_from_pool(&work);
25 work->FrameInterface->Release(work);
26}
27MFXVideoDECODE_Close(session);
Here the application should use the MFXVideoDECODE_QueryIOSurf()
function to obtain the number of working frame surfaces required to reorder
output frames. It is also required that
MFXMemory_GetSurfaceForDecode()
call is done after decoder
initialization. In the MFXVideoDECODE_DecodeFrameAsync()
the Intel® VPL
library increments reference counter of incoming surface frame so it is required
that the application releases frame surface after the call.
Another approach to decode frames is to allocate video frames on-fly with help
of MFXMemory_GetSurfaceForDecode()
function, feed the library and
release working surface after MFXVideoDECODE_DecodeFrameAsync()
call.
Attention
Please pay attention on two release calls for surfaces: after
MFXVideoDECODE_DecodeFrameAsync()
to decrease reference counter of working surface returned byMFXMemory_GetSurfaceForDecode()
. AfterMFXVideoCORE_SyncOperation()
to decrease reference counter of output surface returned byMFXVideoDECODE_DecodeFrameAsync()
.
1MFXVideoDECODE_Init(session, &init_param);
2sts=MFX_ERR_MORE_DATA;
3for (;;) {
4 if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
5 append_more_bitstream(bitstream);
6 bits=(end_of_stream())?NULL:bitstream;
7 MFXMemory_GetSurfaceForDecode(session, &work);
8 sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,work,&disp,&syncp);
9 work->FrameInterface->Release(work);
10 if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
11 // skipped other error handling
12 if (sts==MFX_ERR_NONE) {
13 disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
14 do_something_with_decoded_frame(disp);
15 disp->FrameInterface->Release(disp);
16 }
17}
18MFXVideoDECODE_Close(session);
The following pseudo code shows the decoding procedure according to the legacy mode with external video frames allocation:
1MFXVideoDECODE_DecodeHeader(session, bitstream, &init_param);
2MFXVideoDECODE_QueryIOSurf(session, &init_param, &request);
3allocate_pool_of_frame_surfaces(request.NumFrameSuggested);
4MFXVideoDECODE_Init(session, &init_param);
5sts=MFX_ERR_MORE_DATA;
6for (;;) {
7 if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
8 append_more_bitstream(bitstream);
9 find_free_surface_from_the_pool(&work);
10 bits=(end_of_stream())?NULL:bitstream;
11 sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,work,&disp,&syncp);
12 if (sts==MFX_ERR_MORE_SURFACE) continue;
13 if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
14 if (sts==MFX_ERR_REALLOC_SURFACE) {
15 MFXVideoDECODE_GetVideoParam(session, ¶m);
16 realloc_surface(work, param.mfx.FrameInfo);
17 continue;
18 }
19 // skipped other error handling
20 if (sts==MFX_ERR_NONE) {
21 disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
22 do_something_with_decoded_frame(disp);
23 }
24}
25MFXVideoDECODE_Close(session);
26free_pool_of_frame_surfaces();
Note the following key points about the example:
The application can use the
MFXVideoDECODE_DecodeHeader()
function to retrieve decoding initialization parameters from the bitstream. This step is optional if the data is retrievable from other sources such as an audio/video splitter.The
MFXVideoDECODE_DecodeFrameAsync()
function can return following status codes in addition to the described above:mfxStatus::MFX_ERR_MORE_SURFACE
: The function needs one more frame surface to produce any output.mfxStatus::MFX_ERR_REALLOC_SURFACE
: Dynamic resolution change case - the function needs a bigger working frame surface (work).
The following pseudo code shows the simplified decoding procedure:
1sts=MFX_ERR_MORE_DATA;
2for (;;) {
3 if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
4 append_more_bitstream(bitstream);
5 bits=(end_of_stream())?NULL:bitstream;
6 sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,NULL,&disp,&syncp);
7 if (sts==MFX_ERR_MORE_SURFACE) continue;
8 if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
9 // skipped other error handling
10 if (sts==MFX_ERR_NONE) {
11 disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
12 do_something_with_decoded_frame(disp);
13 disp->FrameInterface->Release(disp);
14 }
15}
Intel® VPL API version 2.0 introduces a new decoding approach. For simple use cases,
when the user wants to decode a stream and does not want to set additional
parameters, a simplified procedure for the decoder’s initialization has been
proposed. In this scenario it is possible to skip explicit stages of a
stream’s header decoding and the decoder’s initialization and instead to perform
these steps implicitly during decoding of the first frame. This change also
requires setting the additional field mfxBitstream::CodecId
to
indicate codec type. In this mode the decoder allocates
mfxFrameSurface1
internally, so users should set the input surface
to zero.
Bitstream Repositioning#
The application can use the following procedure for bitstream reposition during decoding:
Use the
MFXVideoDECODE_Reset()
function to reset the Intel® VPL decoder.Optional: If the application maintains a sequence header that correctly decodes the bitstream at the new position, the application may insert the sequence header to the bitstream buffer.
Append the bitstream from the new location to the bitstream buffer.
Resume the decoding procedure. If the sequence header is not inserted in the previous steps, the Intel® VPL decoder searches for a new sequence header before starting decoding.
Broken Streams Handling#
Robustness and the capability to handle a broken input stream is an important part of the decoder.
First, the start code prefix (ITU-T* H.264 3.148 and ITU-T H.265 3.142) is used to separate NAL units. Then all syntax elements in the bitstream are parsed and verified. If any of the elements violate the specification, the input bitstream is considered invalid and the decoder tries to re-sync (find the next start code). Subsequent decoder behavior is dependent on which syntax element is broken:
SPS header is broken: return
mfxStatus::MFX_ERR_INCOMPATIBLE_VIDEO_PARAM
(HEVC decoder only, AVC decoder uses last valid).PPS header is broken: re-sync, use last valid PPS for decoding.
Slice header is broken: skip this slice, re-sync.
Slice data is broken: corruption flags are set on output surface.
Many streams have IDR frames with frame_num != 0
while the specification
says that “If the current picture is an IDR picture, frame_num shall be equal to
0” (ITU-T H.265 7.4.3).
VUI is also validated, but errors do not invalidate the whole SPS. The decoder either does not use the corrupted VUI (AVC) or resets incorrect values to default (HEVC).
Note
Some requirements are relaxed because there are many streams which violate the strict standard but can be decoded without errors.
Corruption at the reference frame is spread over all inter-coded pictures that
use the reference frame for prediction. To cope with this problem you must
either periodically insert I-frames (intra-coded) or use the intra-refresh
technique. The intra-refresh technique allows recovery from corruptions within a
predefined time interval. The main point of intra-refresh is to insert a cyclic
intra-coded pattern (usually a row) of macroblocks into the inter-coded
pictures, restricting motion vectors accordingly. Intra-refresh is often used in
combination with recovery point SEI, where the recovery_frame_cnt
is derived
from the intra-refresh interval. The recovery point SEI message is well
described at ITU-T H.264 D.2.7 and ITU-T H.265 D.2.8. If decoding starts from AU
associated with this SEI message, then the message can be used by the decoder to
determine from which picture all subsequent pictures have no errors. In
comparison to IDR, the recovery point message does not mark reference pictures
as “unused for reference”.
Besides validation of syntax elements and their constraints, the decoder also uses various hints to handle broken streams:
If there are no valid slices for the current frame, then the whole frame is skipped.
The slices which violate slice segment header semantics (ITU-T H.265 7.4.7.1) are skipped. Only the
slice_temporal_mvp_enabled_flag
is checked for now.Since LTR (Long Term Reference) stays at DPB until it is explicitly cleared by IDR or MMCO, the incorrect LTR could cause long standing visual artifacts. AVC decoder uses the following approaches to handle this:
When there is a DPB overflow in the case of an incorrect MMCO command that marks the reference picture as LT, the operation is rolled back.
An IDR frame with
frame_num != 0
can’t be LTR.
If the decoder detects frame gapping, it inserts “fake”’” (marked as non-existing) frames, updates FrameNumWrap (ITU-T H.264 8.2.4.1) for reference frames, and applies the Sliding Window (ITU-T H.264 8.2.5.3) marking process. Fake frames are marked as reference, but since they are marked as non-existing, they are not used for inter-prediction.
VP8 Specific Details#
Unlike other Intel® VPL supported decoders, VP8 can accept only a complete frame as
input. The application should provide the complete frame accompanied by the
MFX_BITSTREAM_COMPLETE_FRAME
flag. This is the single specific
difference.
JPEG#
The application can use the same decoding procedures for JPEG/motion JPEG decoding, as shown in the following pseudo code:
// optional; retrieve initialization parameters
MFXVideoDECODE_DecodeHeader(...);
// decoder initialization
MFXVideoDECODE_Init(...);
// single frame/picture decoding
MFXVideoDECODE_DecodeFrameAsync(...);
MFXVideoCORE_SyncOperation(...);
// optional; retrieve meta-data
MFXVideoDECODE_GetUserData(...);
// close
MFXVideoDECODE_Close(...);
The MFXVideoDECODE_Query()
function will return
mfxStatus::MFX_ERR_UNSUPPORTED
if the input bitstream contains
unsupported features.
For still picture JPEG decoding, the input can be any JPEG bitstreams that conform to the ITU-T Recommendation T.81 with an EXIF or JFIF header. For motion JPEG decoding, the input can be any JPEG bitstreams that conform to the ITU-T Recommendation T.81.
Unlike other Intel® VPL decoders, JPEG decoding supports three different output
color formats: NV12, YUY2, and RGB32. This support
sometimes requires internal color conversion and more complicated
initialization. The color format of the input bitstream is described by the
mfxInfoMFX::JPEGChromaFormat
and
mfxInfoMFX::JPEGColorFormat
fields. The
MFXVideoDECODE_DecodeHeader()
function usually fills them in. If the
JPEG bitstream does not contains color format information, the application
should provide it. Output color format is described by general Intel® VPL
parameters: the mfxFrameInfo::FourCC
and
mfxFrameInfo::ChromaFormat
fields.
Motion JPEG supports interlaced content by compressing each field (a half-height frame) individually. This behavior is incompatible with the rest of the Intel® VPL transcoding pipeline, where Intel® VPL requires fields to be in odd and even lines of the same frame surface. The decoding procedure is modified as follows:
The application calls the
MFXVideoDECODE_DecodeHeader()
function with the first field JPEG bitstream to retrieve initialization parameters.The application initializes the Intel® VPL JPEG decoder with the following settings:
The
PicStruct
field of themfxVideoParam
structure set to the correct interlaced type,MFX_PICSTRUCT_FIELD_TFF
orMFX_PICSTRUCT_FIELD_BFF
, from the motion JPEG header.Double the
Height
field in themfxVideoParam
structure as the value returned by theMFXVideoDECODE_DecodeHeader()
function describes only the first field. The actual frame surface should contain both fields.
During decoding, the application sends both fields for decoding in the same
mfxBitstream
. The application should also setmfxBitstream::DataFlag
toMFX_BITSTREAM_COMPLETE_FRAME
. Intel® VPL decodes both fields and combines them into odd and even lines according to Intel® VPL convention.
By default, the MFXVideoDECODE_DecodeHeader()
function returns the
Rotation
parameter so that after rotation, the pixel at the
first row and first column is at the top left. The application can overwrite the
default rotation before calling MFXVideoDECODE_Init()
.
The application may specify Huffman and quantization tables during decoder
initialization by attaching mfxExtJPEGQuantTables
and
mfxExtJPEGHuffmanTables
buffers to the mfxVideoParam
structure. In this case, the decoder ignores tables from bitstream and uses
the tables specified by the application. The application can also retrieve these
tables by attaching the same buffers to mfxVideoParam
and calling
MFXVideoDECODE_GetVideoParam()
or
MFXVideoDECODE_DecodeHeader()
functions.
Multi-view Video Decoding#
The Intel® VPL MVC decoder operates on complete MVC streams that contain all view and temporal configurations. The application can configure the Intel® VPL decoder to generate a subset at the decoding output. To do this, the application must understand the stream structure and use the stream information to configure the decoder for target views.
The decoder initialization procedure is as follows:
The application calls the
MFXVideoDECODE_DecodeHeader()
function to obtain the stream structural information. This is done in two steps:The application calls the
MFXVideoDECODE_DecodeHeader()
function with themfxExtMVCSeqDesc
structure attached to themfxVideoParam
structure. At this point, do not allocate memory for the arrays in themfxExtMVCSeqDesc
structure. Set theView
,ViewId
, andOP
pointers to NULL and setNumViewAlloc
,NumViewIdAlloc
, andNumOPAlloc
to zero. The function parses the bitstream and returnsmfxStatus::MFX_ERR_NOT_ENOUGH_BUFFER
with the correct values forNumView
,NumViewId
, andNumOP
. This step can be skipped if the application is able to obtain theNumView
,NumViewId
, andNumOP
values from other sources.The application allocates memory for the
View
,ViewId
, andOP
arrays and calls theMFXVideoDECODE_DecodeHeader()
function again. The function returns the MVC structural information in the allocated arrays.
The application fills the
mfxExtMVCTargetViews
structure to choose the target views, based on information described in themfxExtMVCSeqDesc
structure.The application initializes the Intel® VPL decoder using the
MFXVideoDECODE_Init()
function. The application must attach both themfxExtMVCSeqDesc
structure and themfxExtMVCTargetViews
structure to themfxVideoParam
structure.
In the above steps, do not modify the values of the
mfxExtMVCSeqDesc
structure after the
MFXVideoDECODE_DecodeHeader()
function, as the Intel® VPL decoder uses the
values in the structure for internal memory allocation. Once the application
configures the Intel® VPL decoder, the rest of the decoding procedure remains
unchanged. As shown in the pseudo code below, the application calls the
MFXVideoDECODE_DecodeFrameAsync()
function multiple times to obtain
all target views of the current frame picture, one target view at a time. The
target view is identified by the FrameID
field of the
mfxFrameInfo
structure.
1mfxExtBuffer *eb[2];
2mfxExtMVCSeqDesc seq_desc;
3mfxVideoParam init_param;
4
5init_param.ExtParam=(mfxExtBuffer **)&eb;
6init_param.NumExtParam=1;
7eb[0]=(mfxExtBuffer *)&seq_desc;
8MFXVideoDECODE_DecodeHeader(session, bitstream, &init_param);
9
10/* select views to decode */
11mfxExtMVCTargetViews tv;
12init_param.NumExtParam=2;
13eb[1]=(mfxExtBuffer *)&tv;
14
15/* initialize decoder */
16MFXVideoDECODE_Init(session, &init_param);
17
18/* perform decoding */
19for (;;) {
20 MFXVideoDECODE_DecodeFrameAsync(session, bits, work, &disp, &syncp);
21 disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
22}
23
24/* close decoder */
25MFXVideoDECODE_Close(session);
Combined Decode with Multi-channel Video Processing#
The Intel® VPL exposes interface for making decode and video processing operations
in one call. Users can specify a number of output processing channels and
multiple video filters per each channel. This interface supports only internal
memory allocation model and returns array of processed frames through
mfxSurfaceArray
reference object as shown by the example:
1num_channel_par = 2;
2// first video processing channel with resize
3vpp_par_array[0]->VPP.Width = 400;
4vpp_par_array[0]->VPP.Height = 400;
5
6// second video channel with color conversion filter
7vpp_par_array[1]->VPP.FourCC = MFX_FOURCC_UYVY;
8
9sts = MFXVideoDECODE_VPP_Init(session, decode_par, vpp_par_array, num_channel_par);
10
11sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session, bitstream, NULL, 0, &surf_array_out);
12
13//surf_array_out layout is
14do_smth(surf_array_out->Surfaces[0]); //The first channel contains decoded frames.
15do_smth(surf_array_out->Surfaces[1]); //The second channel contains resized frames after decode.
16do_smth(surf_array_out->Surfaces[2]); //The third channel contains color converted frames after decode.
It’s possible that different video processing channels may have different latency:
1//1st call
2sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session, bitstream, NULL, 0, &surf_array_out);
3//surf_array_out layout is
4do_smth(surf_array_out->Surfaces[0]); //decoded frame
5do_smth(surf_array_out->Surfaces[1]); //resized frame (ChannelId = 1). The first frame from channel with resize available
6// no output from channel with ADI output since it has one frame delay
7
8//2nd call
9sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session, bitstream, NULL, 0, &surf_array_out);
10//surf_array_out layout is
11do_smth(surf_array_out->Surfaces[0]); //decoded frame
12do_smth(surf_array_out->Surfaces[1]); //resized frame (ChannelId = 1)
13do_smth(surf_array_out->Surfaces[2]); //ADI output (ChannelId = 2). The first frame from ADI channel
Application can match decoded frame w/ specific VPP channels using
mfxFrameData::TimeStamp
, :cpp:member:mfxFrameData::FrameOrder` and
mfxFrameInfo::ChannelId
.
Application can skip some or all channels including decoding output with help of skip_channels and num_skip_channels parameters as follows: application fills skip_channels array with ChannelId`s to disable output of correspondent channels. In that case :cpp:member:`surf_array_out would contain only surfaces for the remaining channels. If the decoder’s channel and/or impacted VPP channels don’t have output frame(s) for the current call (for instance, input bitstream doesn’t contain complete frame or deinterlacing/FRC filter have delay) skip_channels parameter is ignored for this channel.
If application disables all channels the SDK returns NULL as
mfxSurfaceArray
.
If application doesn’t need to disable any channels it sets num_skip_channels to zero, skip_channels is ignored when num_skip_channels is zero.
If application doesn’t need to make scaling or cropping operations it has to set the following fields
mfxFrameInfo::Width
, mfxFrameInfo::Height
,
mfxFrameInfo::CropX
, mfxFrameInfo::CropY
mfxFrameInfo::CropW
, mfxFrameInfo::CropH
of the VPP channel to zero.
In that case output surfaces have the original decoded resolution and cropping. The operation supports
bitstreams with resolution change without need of MFXVideoDECODE_VPP_Reset()
call.
Note
Even if more than one input compressed frame is consumed, the
MFXVideoDECODE_VPP_DecodeFrameAsync()
produces only one decoded
frame and correspondent frames from VPP channels.