Low-level C API#
Introduction#
This document provides instructions on how to use the Intel® Data Mover Library. It describes general usage concepts, main entities and detailed operation descriptions.
For general introduction to Intel® DML, see Introduction.
Header Files#
Your application only needs to include one header file: dml.h
, which
includes the entire API definition.
How to Use the Library#
The library operation is based on the following concepts:
Your application needs to fill in a data structure with the necessary parameters to describe the job to be done. The structure is passed to the library and contains the results (error codes, output parameters, and so on) after operation.
The Intel® DML library does not perform any internal memory allocations.
Your application must provide all the memory buffers that are required for the execution.
Memory Allocation and Job Structure Initialization#
As the library supports several implementations/execution paths – DML_PATH_HW
,
DML_PATH_SW
, and DML_PATH_AUTO
– the size of the required memory
buffer for a job structure is not predefined and depends on the used
CPU/HW and execution path.
Before submitting any job to the library, your application must determine the size of the required memory buffer, allocate, and initialize it. The algorithm of these steps is as follows:
Call the
dml_get_job_size()
function:
status = dml_get_job_size(path, job_size_ptr);
where:
status
- has typedml_status_t
and contains execution status of the function – success or error code – seedml.h
for details.path
– has typedml_path_t
, which can be:DML_PATH_HW
– all hardware-supported features are executed by Intel® Data Streaming AcceleratorDML_PATH_SW
– all supported features is executed by the software path of the libraryDML_PATH_AUTO
– the library automatically dispatches execution of the requested jobs either to Intel® Data Streaming Accelerator or to the software path of the library depending on internal heuristics
job_size_ptr
– has typeuint32_t*
(a pointer to unsigned int) and stores the calculated memory buffer size returned by the function.
Allocate the required memory buffer (size in bytes =
*job_size_ptr
). The Intel® Data Mover Library does not allocate or free memory, therefore an application must provide required memory buffer for the job structure and free it after it becomes unnecessary. During allocation, the memory buffer should be casted todml_job_t *data type
– for example:
dml_job_ptr = (dml_job_t *) malloc(*job_size_ptr);
Check allocation results for success.
Call the
dml_init_job()
function:
status = dml_init_job(path, dml_job_ptr);
where:
status
- has typedml_status_t
and contains execution status of the function: success or error code. Seedml.h
for details.path
- has typedml_path_t
and supports the following values:DML_PATH_HW
– all hardware-supported features are executed by Intel® Data Streaming AcceleratorDML_PATH_SW
– all supported features are executed by the software path of the libraryDML_PATH_AUTO
– the library automatically dispatches execution of the requested jobs either to accelerator or to the software path of the library depending on internal heuristics
NOTE: It shall have the same value as at the first step – the call to
dml_get_job_size()
.dml_job_ptr
– has type dml_job_t (Intel DML job structure) and stores the initialized Intel DML job structure returned by the function.
An example of C code for these first steps:
uint32_t size;
dml_job_t *dml_job_ptr;
dml_status_t status;
status = dml_get_job_size(DML_PATH_AUTO, &size );
dml_job_ptr = (dml_job_t *) malloc( size );
status = dml_init_job(DML_PATH_AUTO, 0u, dml_job_ptr);
At this stage, the application has the initialized job structure and can request any library operation it needs.
Job Execution#
The simplest interface is dml_execute_job()
. It does not return any
result until the job has completed. It is the “synchronous” interface.
If the application wants to do other work while the job is being
processed, it can use the “asynchronous” interface. The job is submitted
with dml_submit_job()
. The application can then periodically query
the status of the job with dml_check_job()
. If the application needs
to wait until the job completes, it can call dml_wait_job()
.
Note that the dml_execute_job()
function is essentially a
combination of dml_submit_job()
followed by dml_wait_job()
.
In the context of the behavioral model (i.e. without actual hardware
support), the job is processed when it is submitted, and so the
dml_wait_job()
function always returns completed status. It is not
the case when real hardware is involved.
The job structure contains three types of data:
Parameters from the application to the library defining the job to be done;
Results from the library to the application;
Internal state.
In some cases, a larger overall task may be broken into a series of separate library calls. For example, an application copying a large amount of data might call the library repeatedly with 64kB buffers, until the end of the data buffer is reached. The mechanism used to relate these calls to each other is that they all use the same job structure. As it will be described later, flags are used to indicate whether a particular job submission is the start of a new overall task or whether it is continuing a previous one. Because of it, there is no separate “init” function; the initialization is implied by the job’s parameters.
There are also a number of auxiliary functions, which will be described as they become relevant.
A number of examples can be found in the Code Samples and Examples section.
Low Bandwidth Memory#
The Library includes features to improve system performance when accessing memory with lower bandwidth or higher latency than DRAM. To use these features, WQs must be configured to work with various traffic classes (TC). The next flags allow you to choose which of the TCs to use for the operation:
DML_FLAG_ADDRESS1_TCB
- for most operation types selects TC-B for reads from the source addressDML_FLAG_ADDRESS2_TCB
- for most operation types selects TC-B for writes to the destination addressDML_FLAG_ADDRESS3_TCB
- specific flag that is interpreted differently for each operation
You can read more about the flags in our docs
Persistent Memory Support#
The Library provides several features intended to improve its utility with persistent memory:
The Steering Tag selector - allows user to select platform-defined steering tag. In the Library it is represented by two flags:
DML_FLAG_DST1_DURABLE
- writes to the first destination are identified as writes to durable memoryDML_FLAG_DUALCAST_DST2_DURABLE
- specific flag for memory copy with dualcast operation. Writes to the second destination are identified as writes to durable memory
You can read more about the flags in our docs
Devices Selection and NUMA Support#
By default, the library selects devices from any NUMA node within the socket of the calling thread. In addition, Intel DML supports NUMA-aware device selection. If a user needs to use a device from a specific node, they can set the NUMA ID parameter of the job to the specific node ID:
dml_job_t *dml_job_ptr;
job->numa_id = <int32_t>;
Attention
The default library behavior changed starting from Intel DML 1.2.0 release.
Previously, the library only selected devices from the NUMA node of the calling thread. Now, the library selects devices from any NUMA node within the socket of the calling thread.
Page Fault handling#
Page fault can occur during workload processing on a hardware. When the fault occurs, the outcome is based on the execution path:
- Hardware path: DML_STATUS_PAGE_FAULT_ERROR
is returned.
- Auto path: the library resolves the fault via completion of the workload on a CPU.
The DML_FLAG_BLOCK_ON_FAULT
flag can be used to have page faults be resolved on accelerator.
Using DML_FLAG_BLOCK_ON_FAULT
can be done for all operations except Batch, Fence, and Drain.
All available accelerator workqueues need to be configured to allow for blocking on fault "block_on_fault":1
.
On by default in DML provided accelerator configuration files.
Please refer to Accelerator Configuration for more information.
Cache Control#
The accelerator allows for writing to cache via the DML_FLAG_PREFETCH_CACHE
flag for all operations that write to memory.
If the flag is enabled, the accelerator hints that cache entries be allocated to contain data written by the operation.
Basic Fields of Job Structure#
A complete description of the job structure fields can be found in Section (N).
Field Name |
Description |
---|---|
|
The starting point of the input data. |
|
Length of the input data.
On completion, the |
|
The starting point of the input data in case if the second buffer is
required (length is described with |
|
The starting point of the resulting data. |
|
Length of the resulting data. |
|
The starting point of the resulting data
in case if the second output buffer is required (length is described
with |
|
Enumerated field that defines that operation to be performed. |
|
Flags that affect the operation behavior. |
|
Sets CRC initial value (seed). This field also returns the calculated CRC value. |
|
An array of patterns for the Compare Pattern operation. |
|
NUMA node id which is used to select a device. |
Operations#
Job Flow Control Features#
No-op operation#
The No-op operation can be used with the Fence
flag in the batch to
ensure that all previous operations in the batch completed. It performs
no any other operation.
Batch operation#
The Batch operation processes multiple jobs at once. Batch operation has
sequence of operations at destination_first_ptr
, the size of memory
region for a specific number of operations can be calculated via
dml_get_batch_size
. Operations can be added to the batch via set of
set-functions:
dml_batch_set_nop_by_index
dml_batch_set_mem_move_by_index
dml_batch_set_dualcast_by_index
dml_batch_set_compare_by_index
dml_batch_set_compare_pattern_by_index
dml_batch_set_crc_by_index
dml_batch_set_copy_crc_by_index
dml_batch_set_fill_by_index
dml_batch_set_cache_flush_by_index
dml_batch_set_delta_create_by_index
dml_batch_set_delta_apply_by_index
dml_batch_set_dif_check_by_index
dml_batch_set_dif_update_by_index
dml_batch_set_dif_insert_by_index
dml_batch_set_dif_strip_by_index
Result, status and crc of specific operation can be obtained via get-functions:
dml_batch_get_result
dml_batch_get_status
dml_batch_get_crc
Attention
Maximum number of operations for batch execution depends on accelerator configuration.
Minimum number of operations required for batch execution
is defined by the library and could be obtained with DML_MIN_BATCH_SIZE
define.
Drain#
Use the Drain operation to wait for all jobs completion before exiting. It cannot be a part of batch and is treated as unsupported operation type for a batch.
Memory Transferring and Filling Features#
Memory Move#
The Memory Move operation copies (moves) memory from
source_first_ptr
to destination_first_ptr
. The number of moved
bytes is given by the source_length
parameter.
The
destination_length
parameter must contain the size of destinations (for internal validation).The
source_length
parameter must contain count of bytes to copy.
Note
By default, the Memory Move operation is enabled for overlapping
buffers. If you need to disable Move and enable only Copy behavior,
you should set the DML_FLAG_COPY
flag.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
Memory Copy with Dualcast#
The Memory Copy with Dualcast operation copies memory from the
source_first_ptr
to both destination_first_ptr
and
destination_second_ptr
.
The
destination_length
parameter must contain the size of destinations (for internal validation).The
source_length
parameter must contain count of bytes to copy.There are no alignment requirements for
source_first_ptr
andsource_length
.For
destination_first_ptr
anddestination_second_ptr
, the 11:0 bits must be the same.
Note
Move operation is not supported for Memory Copy with Dualcast. If
source_first_ptr
, destination_first_ptr
and
destination_second_ptr
overlap in any combination, the error is
thrown.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
and destination_second_ptr
.
Please refer to Cache Control for more information.
Fill#
The Memory Fill operation fills the memory at destination_first_ptr
with the bytes from the pattern[8]
parameter.
The
destination_length
parameter must contain the number of bytes to fill.The
pattern
size is always 8 bytes.The
source_length
parameter does not need to be a multiple of 8.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
Memory Compare and Restoring Features#
Compare#
The Memory Compare operation compares memory at source_first_ptr
with memory at source_second_ptr
.
The number of bytes compared is given by the
source_length
parameter.You can configure the operation with the
DML_FLAG_CHECK_RESULT
flag and theexpected_result
field to implementis_equal_memory
andis_not_equal_memory
. This allows subsequent jobs in the same batch with the DML_FLAG_FENCE flag to continue or stop execution of the batch based on the result of comparison.
|
|
Operational Goal |
---|---|---|
0 |
X |
Check if memory regions are equal |
1 |
0 |
Check if memory regions are equal |
1 |
1 |
Check if memory regions aren’t equal |
If memory regions match each other, the result field contains 0, otherwise, it has 1.
If
result == 1
, theoffset
field contains the offset of the first difference.If operation is successful and
DML_FLAG_CHECK_RESULT
flag is set, theresult
field bit 1 is set according to the table below.
|
|
result bit-0 |
result bit-1 |
meaning |
---|---|---|---|---|
0 |
X |
X |
0 |
EQUAL(expected result) |
1 |
0 |
0 |
0 |
EQUAL(expected result) |
1 |
0 |
1 |
1 |
NOT EQUAL(expected result) |
1 |
1 |
0 |
1 |
NOT EQUAL(not expected result) |
1 |
1 |
1 |
0 |
EQUAL(not expected result) |
Compare Pattern#
The Compare Pattern operation compares memory at source_first_ptr
with the value in the pattern field.
If
source_first_ptr
memory region matches the pattern, then the result field contains 0, otherwise it has 1.If
result == 1
, theoffset
field contains the offset of the first difference. (It may not be the exact byte location, but it is guaranteed to be no greater than the first difference).The
source_length
parameter does not need to be a multiple of 8.
Note
This operation can be used with the DML_FLAG_CHECK_RESULT
flag and
expected_result
field. You can find mode details in the Compare
Operation section.
Create Delta Record#
Warning
This operation is currently not supported on the hardware path.
The Create Delta Record operation compares memory at
source_first_ptr
with memory at source_second_ptr
and generates
a delta record containing the information needed to update
source_first_ptr
to match source_second_ptr
. The number of bytes
compared is given by the source_length
parameter.
The delta record is written into the address specified in
destination_first_ptr
. Destination buffer size must be set in thedestination_length
field.The
source_length
parameter is limited by the maximum offset that can be stored in the delta record.
Note
source_first_ptr
, source_second_ptr
and source_length
must be 8-byte aligned (multiple of 8).
destination_length
must be a multiple of 10.
Delta record format is described below:
Off set |
Data byte 0 |
Data byte 1 |
Data byte 2 |
Data byte 3 |
Data byte 4 |
Data byte 5 |
Data byte 6 |
Data byte 7 |
Off set 1 |
Data byte 0 |
Data byte 1 |
Data byte 2 |
Data byte 3 |
Data byte 4 |
Data byte 5 |
Data byte 6 |
Data byte 7 |
… |
… |
… |
… |
… |
… |
… |
… |
… |
Off set n |
Data byte 0 |
Data byte 1 |
Data byte 2 |
Data byte 3 |
Data byte 4 |
Data byte 5 |
Data byte 6 |
Data byte 7 |
The delta records are written to destination_first_ptr
, therefore it
must point to the memory buffer of the appropriate size. The offset
field of the delta record has uint16_t
data type, therefore the size
of the delta record must be:
a multiple of the delta size (10 bytes)
less than the maximum number of deltas that can be generated from a single cache line (80 bytes)
The actual size of the generated delta record depends on the number of
differences detected. This size is written to the destination_length
field of the dml_job_t
structure. The result of comparison is
written to the result
field of the dml_job_t
structure.
If two regions match exactly, then the
result
field is 0,bytes_processed
field is 0,destination_length
field is 0.If two regions do not match, and a complete set of deltas is written to the delta record, then the result field is 1, the
offset
field contains the first difference’s offset, and thedestination_length
field contains byte size of the delta record.If two regions do not match, and the space needed to record all the deltas exceeds the significant
destination_length
, then result is 2, when:destination_length
contains a created delta record size beforedestination_first_ptr
buffer size has been overflowedsource_length
contains the remaining number of bytes to compareoffset
field contains the first difference’s offset
Note
Since the offset
field in the delta record is uint16_t
representing a multiple of 8 bytes, the maximum offset that can be
expressed is 0x7fff8
, so the maximum transfer size (the
source_length
parameter) is 0x80000
bytes (512KB).
This operation can be used with the DML_FLAG_CHECK_RESULT
flag and
expected_result
field. Mode details are in the Compare Operation
description.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
Apply Delta Record#
Warning
This operation is currently not supported on the hardware path.
The Apply Delta Record operation applies a delta record at
source_first_ptr
to destination_first_ptr
. The delta record must
be created by the Create Delta Record operation with result
equal to
1.
The
destination_length
parameter must contain the size of memory region to update.The
source_length
parameter must contain the delta record size as returned to thedestination_length
field after the Create Delta Record operation.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
Memory Hash Features#
CRC Generation#
The CRC Generation computes the CRC on memory at source_first_ptr
.
The number of bytes used for CRC computation is given by the
source_length
parameter. The crc_checksum_ptr
field in the dml_job_t
is used as a seed for CRC calculation, therefore it can be used for
continuation.
This operation computes CRC using the polynomial 0x11edc6f41
(iSCSI
Protocol, RFC 3720). When the source_length
parameter is not a
multiple of 4 bytes, the source data is padded to the end with zeros.
Copy with CRC Generation#
The Copy with CRC Generation operation copies memory from
source_first_ptr
to destination_first_ptr
and computes CRC on
the data copied.
The number of bytes copied is given by the source_length
parameter.
This operation runs as a sequential call of two operations.
If source_first_ptr
and destination_first_ptr
overlap, the error
is thrown.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
Data Integrity Field Features#
The Data Integrity Field (DIF) provides a system solution to protect the communication path between a host and storage device for end-to-end integrity.
Data block protected with DIF has the following format:
Data block (512, 520, 4096 or 4104 bytes)
Reference tag (4 bytes)
Meta tag (2 bytes)
Guard (2 bytes)
where:
Data block contains some binary data.
The Reference Tag nominally contains information associated with a specific data block within some context, such as the lower 4 bytes of the Logical Block Address.
The Meta Tag (called as Application in Intel DML context) contains additional context information that is nominally held fixed within the context of a given I/O operation.
Guard field is computed from the data in the standard data block with CRC16.
You can find more information about End-to-End Data Protection Justification with DIF here.
How to configure Data Integrity Field Operation
You can configure all the Intel DML operations that work with data
protected with DIF using the dml_dif_config_t
structure. This
structure is provided as a field of dml_job_t
.
typedef struct
{
uint32_t flags;
uint32_t source_reference_tag_seed;
uint16_t source_application_tag_seed;
uint16_t source_application_tag_mask;
uint32_t destination_reference_tag_seed;
uint16_t destination_application_tag_seed;
uint16_t destination_application_tag_mask;
dml_dif_block_size_t block_size;
} dml_dif_config_t;
block_size
- field can be used to set the size of a block that is protected with DIF. All possible data block sizes are presented below:typedef enum { DML_DIF_BLOCK_SIZE_512, /**< Size is 512 bytes */ DML_DIF_BLOCK_SIZE_520, /**< Size is 520 bytes */ DML_DIF_BLOCK_SIZE_4096, /**< Size is 4096 bytes */ DML_DIF_BLOCK_SIZE_4104 /**< Size is 4104 bytes */ } dml_dif_block_size_t;
The
target_reference_tag_seed
andtarget_application_tag_seed
fields contain values for tags of the first data block (seed). For each operation, fill in these fields in accordance with the table below:Operation
Source Tag Seeds
Destination Tag Seeds
DML_OP_DIF_CHECK
Required
Not required
DML_OP_DIF_INSERT
Not required
Required
DML_OP_DIF_STRIP
Required if DIF check enabled
Not required
DML_OP_DIF_UPDATE
Required if DIF check enabled
Required
The
target_aplication_tag_mask
field can be used to mask some application tag bits. Note that 1 bit in the mask means that the corresponding bit is masked in the tag. For example, 0xFE masks all application tag bits except the first low bit.With the
flags
field, you can thoroughly configure the operation behavior. Flags are separated by groups and presented below.СRC generation can be configured with the following flags:
Flag
Description
DML_DIF_FLAG_INVERT_CRC_SEED
The initial crc seed is
0xFFFF
DML_DIF_FLAG_INVERT_CRC_RESULT
Invert CRC result
Source DIF can be configured with following flags:
Flag
Description
DML_DIF_FLAG_SRC_F_ENABLE_ERROR
Throw an error on condition set by the
DML_DIF_FLAG_SRC_F_DETECT_ALL
flagDML_DIF_FLAG_SRC_F_DETECT_ALL
Detect if all DIF bytes are equal to
0xFF
in the sourceDML_DIF_FLAG_SRC_F_DETECT_APP_TAG
Skip check of protected block if Application tag is equal to
0xFF
on the sourceDML_DIF_FLAG_SRC_F_DETECT_TAGS
Skip check of protected block if Application and Reference tag bytes is equal to
0xFF
in the sourceDML_DIF_FLAG_SRC_INC_APP_TAG
Source Application format: Fixed(0) or Incrementing(1)
DML_DIF_FLAG_SRC_GUARD_CHECK_DISABLE
Source Check Guard filed: Enabled (0) or Disabled (1)
DML_DIF_FLAG_SRC_REF_TAG_CHECK_DISABLE
Source Check Reference tag: Enabled (0) or Disabled (1)
DML_DIF_FLAG_SRC_FIX_REF_TAG
Source Reference TAG format: Incrementing(0), Fixed(1)
Destination DIF can be configured with following flags:
Flag
Description
DML_DIF_FLAG_DST_PASS_APP_TAG
Destination Application calculation: Enabled(0) or Disabled (1 Copy from Source)
DML_DIF_FLAG_DST_INC_APP_TAG
Destination Application tag format: Fixed(0), Incrementing(1)
DML_DIF_FLAG_DST_PASS_GUARD
Destination Guard calculation: Enabled(0) or Disabled (1 Copy from Source)
DML_DIF_FLAG_DST_PASS_REF_TAG
Destination Reference calculation: Enabled(0) or Disabled (1 Copy from Source)
DML_DIF_FLAG_DST_FIX_REF_TAG
Destination Reference tag format: Incrementing(0) or Fixed(1)
DIF Check#
The DIF Check operation computes the Data Integrity Field (DIF) on the
source data and compares the computed DIF to the DIF contained in the
source data (source_first_ptr
).
The number of source bytes read is given by the
source_length
parameter. The DIF computation is performed on each block of source data that is 512, 520, 4096, or 4104 bytes.The
source_length
parameter should be a multiple of the source block size plus 8 bytes for each source block.If an error is detected in the DIF in the source data, the operation stops.
The
result
field contains a type of the DIF error in flag format (see the table below for more information) and theoffset
field contains the number of bytes successfully processed (it does not include the block in which the error was detected).
Hex Value |
Result Flag Name |
Description |
---|---|---|
|
|
Guard mismatch. This value is reported under the following conditions:
|
|
|
Application Tag mismatch. This value is reported under the following conditions:
|
|
|
Reference Tag mismatch. This value is reported under the following conditions:
|
|
|
All F Detect Error. This value is reported under the following conditions:
|
DIF Insert#
The DIF Insert operation copies memory from the source_first_ptr
to
destination_first_ptr
(length bytes) while computing the DIF on the
source data and inserting the DIF into the output data.
The number of bytes copied is given by the
source_length
parameter.The number of available bytes in
destination_first_ptr
is given by thedestination_length
parameter.The
destination_length
is the number of bytes that are written todestination_first_ptr
(can be calculated assource_length
plus 8 bytes for each source block).If
source_first_ptr
anddestination_first_ptr
overlap, the error is thrown.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
DIF Strip#
The DIF Strip operation copies memory from source_first_ptr
to
destination_first_ptr
, removing the DIF.
Additionally, it computes the DIF value on source_first_ptr
data and
compares to the DIF contained in source_first_ptr
according to the
DIF Source Flags values.
The number of bytes copied is given by the
source_length
parameter and must be a multiple of block size plus 8 for each source block.The number of bytes available to write into destination buffer must be set into
destination_size
.The
destination_size
field after job execution contains the successfully written bytes todestination_first_ptr
(the result size is alwayssource_length
minus 8 bytes for each source block).If
source_first_ptr
anddestination_first_ptr
overlap, the error is thrown.- If an error is detected in the DIF in the source data, the operation
stops. The job structure is updated in the same way as in the DIF Check operation.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
DIF Update#
The DIF Update operation copies memory from source_first_ptr
to
destination_first_ptr
and updates DIFs in accordance with
destination flags. (length bytes).
Additionally, it computes the DIF value on the source_first_ptr
data
and compares it to the DIF contained in the source_first_ptr
according to the DIF Source Flags values.
The number of bytes copied is given by the
source_length
parameter.The
destination_length
parameter must contain the size of destinations (for internal validation).If
source_first_ptr
anddestination_first_ptr
overlap, the error is thrown.- If an error is detected in the DIF in the source data, the operation stops.
The job structure is updated in the same way as in the DIF Check operation.
Note
This operation is able to use the DML_FLAG_PREFETCH_CACHE
flag
to hint that the cache entries be allocated for data written
to destination_first_ptr
.
Please refer to Cache Control for more information.
Cache Flush#
The Cache Flush operation flushes the processor caches at the
destination_first_ptr
address. The number of bytes flushed is given
by the destination_length
parameter.
If
DML_FLAG_DONT_INVALIDATE_CACHE
flag is not set, the affected cache lines are invalidated from every level of the cache hierarchy. If it is set to 1, then modified cache lines are written to main memory, but are not evicted from the caches.
Warning
The DML_FLAG_DONT_INVALIDATE_CACHE
flag is deprecated and will be removed in a later version.
Job Structure Description#
The Job structure is defined in dmldefs.h
.
Field |
Type |
Description |
---|---|---|
|
IN |
Pointer to the next input byte in input stream #1 |
|
IN |
Pointer to the next input byte in input stream #2 |
|
OUT |
Pointer to the next output byte in output stream #1 |
|
OUT |
Pointer to the next output byte in output stream #2 |
|
IN/OUT |
CRC for OUT, CRC seed for IN |
|
IN |
Number of bytes in source to process |
|
IN |
Available bytes count in destination buffer |
|
OUT |
Offset field for operations |
|
IN |
Pattern for operations with pattern |
|
IN |
Operation to be done |
|
IN |
Flags indicating specific modes of operations |
|
OUT |
Result field for some operation where status is not enough |
|
IN |
Expected result for some operations |
|
IN |
Initial tag seed for DIF operations |
|
IN |
Internal spec structure |
Operation Status Values#
The return values are defined in dmldefs.h
.
-
enum dml_status_t
All possible return values of the Intel DML Library functions. (Statuses marked
Internal
indicate an issue within the library)Note
All general statuses are described here.
Values:
-
enumerator DML_STATUS_OK
Operation completed successfully.
-
enumerator DML_STATUS_FALSE_PREDICATE_OK
Success with false predicate.
-
enumerator DML_STATUS_BEING_PROCESSED
Operation is still being processed.
-
enumerator DML_STATUS_INTERNAL_ERROR
Unexpected internal error condition.
-
enumerator DML_STATUS_NULL_POINTER_ERROR
Null pointer error.
-
enumerator DML_STATUS_LIMITS_ERROR
One of the job parameters exceeds configured DML limits.
-
enumerator DML_STATUS_PATH_ERROR
Invalid dmlPath parameter.
-
enumerator DML_STATUS_HINT_ERROR
Invalid hint parameter.
-
enumerator DML_STATUS_CRC_ALIGN_ERROR
crc_checksum_ptr must be 4-byte aligned
-
enumerator DML_STATUS_BATCH_ERROR
One or more operation in the batch completed with status > 0.
-
enumerator DML_STATUS_DELTA_ASCENDT_ERROR
Offsets in the delta record were not in increasing order.
-
enumerator DML_STATUS_DELTA_OFFSET_ERROR
The input length is greater than max available offset
-
enumerator DML_STATUS_DIF_CHECK_ERROR
DIF check failed.
-
enumerator DML_STATUS_JOB_OPERATION_ERROR
Invalid op field in dml_job_t.
-
enumerator DML_STATUS_JOB_FLAGS_ERROR
Invalid flags field in dml_job_t.
-
enumerator DML_STATUS_JOB_LENGTH_ERROR
Invalid length field in dml_job_t.
-
enumerator DML_STATUS_BATCH_LIMITS_ERROR
Invalid number of Jobs for batch operation (LT than 2 or GT than max batch size)
-
enumerator DML_STATUS_DELTA_RECORD_SIZE_ERROR
Delta Record Size is out of range or delta record size is not multiple of 10.
-
enumerator DML_STATUS_OVERLAPPING_BUFFER_ERROR
Overlapping buffers.
-
enumerator DML_STATUS_DUALCAST_ALIGN_ERROR
Bit 11:0 of pDst1 and pDst2 differ in Memory Copy with Dualcast operation.
-
enumerator DML_STATUS_DELTA_ALIGN_ERROR
Misaligned address for Delta Record: pSrc1, pSrc2, pDst1 or length is not 8 - byte aligned.
-
enumerator DML_STATUS_DELTA_INPUT_SIZE_ERROR
Input size is not multiple of 8 for delta operations.
-
enumerator DML_STATUS_MEMORY_OVERFLOW_ERROR
Buffer detected an overflow.
-
enumerator DML_STATUS_JOB_CORRUPTED
dml_job_t structure is corrupted
-
enumerator DML_STATUS_WORK_QUEUE_OVERFLOW_ERROR
WQ overflow.
-
enumerator DML_STATUS_PAGE_FAULT_ERROR
Page Fault occurred during processing on hardware
-
enumerator DML_STATUS_TC_A_NOT_AVAILABLE
WQs configured to work with TC-A are not visible
-
enumerator DML_STATUS_TC_B_NOT_AVAILABLE
WQs configured to work with TC-B are not visible
-
enumerator DML_STATUS_BATCH_TASK_INDEX_OVERFLOW
Batch task index is bigger than size of the batch
-
enumerator DML_STATUS_BATCH_SIZE_ERROR
The desired batch size is either bigger or smaller than the possible one. Refer to DML_MIN_BATCH_SIZE for minimal supported batch size.
-
enumerator DML_STATUS_DRAIN_PAGE_FAULT_ERROR
A page fault occurred while translating a Readback Address in a Drain descriptor
-
enumerator DML_STATUS_UNKNOWN_CACHE_SIZE_ERROR
Max cache size can’t be calculated
-
enumerator DML_STATUS_DIF_STRIP_ADJACENT_ERROR
SRC Address for DIF Strip operation should be greater than (DST Address + SRC Size)
-
enumerator DML_STATUS_INTL_INVALID_PAGE_REQUEST
Internal Status Code
-
enumerator DML_STATUS_INTL_INVALID_RESERVED_FIELD
Internal Status Code
-
enumerator DML_STATUS_INTL_MISALIGNED_DESC_LA
Internal Status Code
-
enumerator DML_STATUS_INTL_INVALID_COMPLETION_HANDLE
Internal Status Code
-
enumerator DML_STATUS_INTL_PAGE_FAULT_ON_TRANSLATION
Internal Status Code
-
enumerator DML_STATUS_INTL_MISALIGNED_CR_ADDRESS
Internal Status Code
-
enumerator DML_STATUS_INTL_MISALIGNED_ADDRESS
Internal Status Code
-
enumerator DML_STATUS_INTL_PRIVILEGE_ERROR
Internal Status Code
-
enumerator DML_STATUS_INTL_TRAFFIC_CLASS_ERROR
Internal Status Code
-
enumerator DML_STATUS_INTL_READBACK_ADDRESS_PAGE_ERROR
Internal Status Code
-
enumerator DML_STATUS_INTL_HARDWARE_READBACK_TIMEOUT
Internal Status Code
-
enumerator DML_STATUS_INTL_HARDWARE_TIMEOUT
Internal Status Code
-
enumerator DML_STATUS_INTL_ADDRESS_TRANSLATION_ERROR
Internal Status Code
-
enumerator DML_STATUS_NOT_SUPPORTED_BY_WQ
Work queue not configured to support operation
-
enumerator DML_STATUS_LIBACCEL_NOT_FOUND
Unable to initialize job because hardware driver was not found
-
enumerator DML_STATUS_LIBACCEL_ERROR
Unable to initialize job because hardware driver API is incompatible
-
enumerator DML_STATUS_WORK_QUEUES_NOT_AVAILABLE
Enabled work queues are not found
-
enumerator DML_STATUS_INIT_HW_NOT_SUPPORTED
Error occured on hw initialization due to failure to write() to wq
-
enumerator DML_STATUS_OK