Restful API

Version Info

GET /rest/v1/version

Get XPU Manager version infos

Status Codes:
Response JSON Object:
  • level_zero_version (string) – Underlying level-zero lib version

  • xpum_version (string) – XPUM version

  • xpum_version_git (string) – The git commit hash of this build

Devices

GET /rest/v1/devices

Get device list

Status Codes:
Response JSON Object:
  • devices[].@odata.id (string) – Link to device detail properties

  • devices[].device_id (integer) – Device id

  • devices[].device_name (string) – Device name

  • devices[].device_type (string) – Device type, now is only GPU

  • devices[].pci_bdf_address (string) – The PCI bdf address of device

  • devices[].pci_device_id (string) – The PCI device id of device

  • devices[].uuid (string) – Device uuid

  • devices[].vendor_name (string) – Vendor name

GET /rest/v1/devices/{deviceId}

Get device properties

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • password (string) – Password for redfish auth

  • username (string) – Username for redfish auth

Status Codes:
Response JSON Object:
  • amc_firmware_name (string) – The AMC firmware name of device

  • amc_firmware_version (string) – The AMC firmware version of device

  • core_clock_rate_mhz (string) – Clock rate for device core, in MHz

  • device_id (integer) – Device id

  • device_name (string) – Device name

  • device_stepping (string) – The stepping of device

  • device_type (string) – Device type

  • driver_version (string) – The driver version

  • firmware_name (string) – The GFX firmware name of device

  • firmware_version (string) – The GFX firmware version of device

  • gfx_data_firmware_name (string) – The GFX_DATA firmware name of device

  • gfx_data_firmware_version (string) – The GFX_DATA firmware version of device

  • gfx_firmware_status (string) – The GFX firmware status

  • gfx_pscbin_firmware_name (string) – The PSC firmware name of device

  • gfx_pscbin_firmware_version (string) – The PSC firmware version of device

  • health.@odata.id (string) – Link to detail info

  • kernel_version (string) – Linux kernel version

  • max_command_queue_priority (string) – Maximum priority for command queues. Higher value is higher priority

  • max_hardware_contexts (string) – Maximum number of logical hardware contexts

  • max_mem_alloc_size_byte (string) – The total allocatable memory, in bytes

  • memory_bus_width (string) – Memory bus width

  • memory_ecc_state (string) – The state of memory ecc

  • memory_free_size_byte (string) – The free memory, in bytes

  • memory_physical_size_byte (string) – Device physical memory size, in bytes

  • number_of_eus (string) – The number of EUs

  • number_of_eus_per_sub_slice (string) – Maximum number of EUs per sub-slice

  • number_of_media_engines (string) – The number of media engines

  • number_of_media_enh_engines (string) – The number of media enhancement engines

  • number_of_memory_channels (string) – Number of memory channels

  • number_of_slices (string) – Maximum number of slices

  • number_of_sub_slices_per_slice (string) – Maximum number of sub-slices per slice

  • number_of_threads_per_eu (string) – Maximum number of threads per EU

  • number_of_tiles (string) – The number of tiles

  • pci_bdf_address (string) – The PCI bdf address of device

  • pci_device_id (string) – The PCI device id of device

  • pci_slot (string) – PCI slot of device

  • pci_sub_device_id (string) – The PCI sub device id of device

  • pci_vendor_id (string) – The PCI vendor id of device

  • pcie_generation (string) – PCIe generation

  • pcie_max_link_width (string) – PCIe max link width

  • physical_eu_simd_width (string) – The physical EU simd width

  • serial_number (string) – Serial number

  • sku_type (string) – The type of SKU

  • socket_id (string) – socket id of OAM GPU

  • topology.@odata.id (string) – Link to detail info

  • uuid (string) – Device uuid

  • vendor_name (string) – Vendor name

GET /rest/v1/devices/amcversions

Get amc firmware versions.

Request JSON Object:
  • password (string) – Password for redfish auth

  • username (string) – Username for redfish auth

Status Codes:
Response JSON Object:
  • amc_fw_version[] (string) – AMC versions

  • error (string) – Error message

Diagnostics

POST /rest/v1/devices/{deviceId}/diagnostics

Run diagnostics for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • level (integer) – The level for diagnostics to run

Status Codes:
GET /rest/v1/devices/{deviceId}/diagnostics

Get diagnostics result for device

Parameters:
  • deviceId (integer) – Device id

Status Codes:
Response JSON Object:
  • component_count (integer) – Component count

  • component_list[].component_type (string) – Component type

  • component_list[].finished (boolean) – Finished or not

  • component_list[].message (string) – Result message

  • component_list[].result (string) – Result status

  • device_id (integer) – Device id

  • end_time (string) – End time

  • finished (boolean) – Finished or not

  • level (integer) – The level for diagnostics to run

  • message (string) – Result message

  • result (string) – Result status

  • start_time (string) – Start time

POST /rest/v1/groups/{groupId}/diagnostics

Run diagnostics for group

Parameters:
  • groupId (integer) – Group id

Request JSON Object:
  • level (integer) – The level for diagnostics to run

Status Codes:
GET /rest/v1/groups/{groupId}/diagnostics

Get diagnostics result for group

Parameters:
  • groupId (integer) – Group id

Status Codes:
Response JSON Object:
  • device_count (integer) – Device count

  • device_list[].component_count (integer) – Component count

  • device_list[].component_list[].component_type (string) – Component type

  • device_list[].component_list[].finished (boolean) – Finished or not

  • device_list[].component_list[].message (string) – Result message

  • device_list[].component_list[].result (string) – Result status

  • device_list[].device_id (integer) – Device id

  • device_list[].end_time (string) – End time

  • device_list[].finished (boolean) – Finished or not

  • device_list[].level (integer) – The level for diagnostics to run

  • device_list[].message (string) – Result message

  • device_list[].result (string) – Result status

  • device_list[].start_time (string) – Start time

  • finished (boolean) – Finished or not

  • group_id (integer) – Group id

Health

GET /rest/v1/devices/{deviceId}/health

Get health for device

Parameters:
  • deviceId (integer) – Device id

Status Codes:
Response JSON Object:
  • core_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • core_temperature.description (string) – The description for health

  • core_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • core_temperature.status (integer) – The status for health

  • core_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • device_id (integer) – Device id

  • frequency.description (string) – The description for health

  • frequency.status (integer) – The status for health

  • memory.description (string) – The description for health

  • memory.status (integer) – The status for health

  • memory_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • memory_temperature.description (string) – The description for health

  • memory_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • memory_temperature.status (integer) – The status for health

  • memory_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • power.custom_threshold (integer) – The custom threshold in watts for health

  • power.description (string) – The description for health

  • power.status (integer) – The status for health

  • power.throttle_threshold (integer) – The throttle threshold in watts for health

  • xe_link_port.description (string) – The description for health

  • xe_link_port.status (integer) – The status for health

GET /rest/v1/groups/{groupId}/health

Get health for group

Parameters:
  • groupId (integer) – Group id

Status Codes:
Response JSON Object:
  • device_count (integer) – Device count

  • device_list[].core_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • device_list[].core_temperature.description (string) – The description for health

  • device_list[].core_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • device_list[].core_temperature.status (integer) – The status for health

  • device_list[].core_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • device_list[].device_id (integer) – Device id

  • device_list[].frequency.description (string) – The description for health

  • device_list[].frequency.status (integer) – The status for health

  • device_list[].memory.description (string) – The description for health

  • device_list[].memory.status (integer) – The status for health

  • device_list[].memory_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • device_list[].memory_temperature.description (string) – The description for health

  • device_list[].memory_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • device_list[].memory_temperature.status (integer) – The status for health

  • device_list[].memory_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • device_list[].power.custom_threshold (integer) – The custom threshold in watts for health

  • device_list[].power.description (string) – The description for health

  • device_list[].power.status (integer) – The status for health

  • device_list[].power.throttle_threshold (integer) – The throttle threshold in watts for health

  • device_list[].xe_link_port.description (string) – The description for health

  • device_list[].xe_link_port.status (integer) – The status for health

  • group_id (integer) – Group id

GET /rest/v1/devices/{deviceId}/health/{healthType}

Get specific health for device and response JSON object only contains targeted-type health

Parameters:
  • deviceId (integer) – Device id

  • healthType (str) – Health type, coreTemperature, memoryTemperature, power, memory, xeLinkPort or frequency

Status Codes:
Response JSON Object:
  • core_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • core_temperature.description (string) – The description for health

  • core_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • core_temperature.status (integer) – The status for health

  • core_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • device_id (integer) – Device id

  • frequency.description (string) – The description for health

  • frequency.status (integer) – The status for health

  • memory.description (string) – The description for health

  • memory.status (integer) – The status for health

  • memory_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • memory_temperature.description (string) – The description for health

  • memory_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • memory_temperature.status (integer) – The status for health

  • memory_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • power.custom_threshold (integer) – The custom threshold in watts for health

  • power.description (string) – The description for health

  • power.status (integer) – The status for health

  • power.throttle_threshold (integer) – The throttle threshold in watts for health

  • xe_link_port.description (string) – The description for health

  • xe_link_port.status (integer) – The status for health

PUT /rest/v1/devices/{deviceId}/health/{healthType}

Set health config for device

Parameters:
  • deviceId (integer) – Device id

  • healthType (str) – Health type, only coreTemperature, memoryTemperature or power

Request JSON Object:
  • custom_threshold (integer) – The custom threshold for coreTemperature in celsius degree, memoryTemperature in celsius degree or power in watts

Status Codes:
GET /rest/v1/groups/{groupId}/health/{healthType}

Get health for group and response JSON object only contains targeted-type health

Parameters:
  • groupId (integer) – Group id

  • healthType (str) – Health type, coreTemperature, memoryTemperature, power, memory, xeLinkPort or frequency

Status Codes:
Response JSON Object:
  • device_count (integer) – Device count

  • device_list[].core_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • device_list[].core_temperature.description (string) – The description for health

  • device_list[].core_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • device_list[].core_temperature.status (integer) – The status for health

  • device_list[].core_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • device_list[].device_id (integer) – Device id

  • device_list[].frequency.description (string) – The description for health

  • device_list[].frequency.status (integer) – The status for health

  • device_list[].memory.description (string) – The description for health

  • device_list[].memory.status (integer) – The status for health

  • device_list[].memory_temperature.custom_threshold (integer) – The custom threshold in celsius degree for health

  • device_list[].memory_temperature.description (string) – The description for health

  • device_list[].memory_temperature.shutdown_threshold (integer) – The shutdown threshold in celsius degree for health

  • device_list[].memory_temperature.status (integer) – The status for health

  • device_list[].memory_temperature.throttle_threshold (integer) – The throttle threshold in celsius degree for health

  • device_list[].power.custom_threshold (integer) – The custom threshold in watts for health

  • device_list[].power.description (string) – The description for health

  • device_list[].power.status (integer) – The status for health

  • device_list[].power.throttle_threshold (integer) – The throttle threshold in watts for health

  • device_list[].xe_link_port.description (string) – The description for health

  • device_list[].xe_link_port.status (integer) – The status for health

  • group_id (integer) – Group id

PUT /rest/v1/groups/{groupId}/health/{healthType}

Set health config for group

Parameters:
  • groupId (integer) – Group id

  • healthType (str) – health type, only coreTemperature, memoryTemperature or power

Request JSON Object:
  • custom_threshold (integer) – The custom threshold for coreTemperature in celsius degree, memoryTemperature in celsius degree or power in watts

Status Codes:

Policy

GET /rest/v1/policy

Get all policies for all devices

Status Codes:
Response JSON Object:
  • [].device_id (integer) – Device id

  • [].policy_list[].action.throttle_device_frequency_max (integer) – The throttle_device_frequency_max value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • [].policy_list[].action.throttle_device_frequency_min (integer) – The throttle_device_frequency_min value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • [].policy_list[].action.type (string) – Policy action type. Supported types: XPUM_POLICY_ACTION_TYPE_THROTTLE_DEVICE, XPUM_POLICY_ACTION_TYPE_NULL

  • [].policy_list[].condition.threshold (integer) – The threshold for policy

  • [].policy_list[].condition.type (string) – Policy conditon type. Supported types: XPUM_POLICY_CONDITION_TYPE_GREATER, XPUM_POLICY_CONDITION_TYPE_LESS, XPUM_POLICY_CONDITION_TYPE_WHEN_OCCUR

  • [].policy_list[].device_id (integer) – Device id

  • [].policy_list[].notify_callback_url (string) – Policy notify callback url

  • [].policy_list[].type (string) – Policy type. Supported types: XPUM_POLICY_TYPE_GPU_TEMPERATURE, XPUM_POLICY_TYPE_GPU_MEMORY_TEMPERATURE, XPUM_POLICY_TYPE_GPU_POWER, XPUM_POLICY_TYPE_RAS_ERROR_CAT_RESET, XPUM_POLICY_TYPE_RAS_ERROR_CAT_PROGRAMMING_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_DRIVER_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE, XPUM_POLICY_TYPE_GPU_MISSING, XPUM_POLICY_TYPE_GPU_THROTTLE

GET /rest/v1/devices/{deviceId}/policy

Get all policies for a device

Parameters:
  • deviceId (integer) – Device id

Status Codes:
Response JSON Object:
  • [].device_id (integer) – Device id

  • [].policy_list[].action.throttle_device_frequency_max (integer) – The throttle_device_frequency_max value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • [].policy_list[].action.throttle_device_frequency_min (integer) – The throttle_device_frequency_min value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • [].policy_list[].action.type (string) – Policy action type. Supported types: XPUM_POLICY_ACTION_TYPE_THROTTLE_DEVICE, XPUM_POLICY_ACTION_TYPE_NULL

  • [].policy_list[].condition.threshold (integer) – The threshold for policy

  • [].policy_list[].condition.type (string) – Policy conditon type. Supported types: XPUM_POLICY_CONDITION_TYPE_GREATER, XPUM_POLICY_CONDITION_TYPE_LESS, XPUM_POLICY_CONDITION_TYPE_WHEN_OCCUR

  • [].policy_list[].device_id (integer) – Device id

  • [].policy_list[].notify_callback_url (string) – Policy notify callback url

  • [].policy_list[].type (string) – Policy type. Supported types: XPUM_POLICY_TYPE_GPU_TEMPERATURE, XPUM_POLICY_TYPE_GPU_MEMORY_TEMPERATURE, XPUM_POLICY_TYPE_GPU_POWER, XPUM_POLICY_TYPE_RAS_ERROR_CAT_RESET, XPUM_POLICY_TYPE_RAS_ERROR_CAT_PROGRAMMING_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_DRIVER_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE, XPUM_POLICY_TYPE_GPU_MISSING, XPUM_POLICY_TYPE_GPU_THROTTLE

POST /rest/v1/devices/{deviceId}/policy

Set a policy for a device.

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • action.throttle_device_frequency_max (integer) – The throttle_device_frequency_max value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.throttle_device_frequency_min (integer) – The throttle_device_frequency_min value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.type (string) – Policy action type. Supported types: XPUM_POLICY_ACTION_TYPE_THROTTLE_DEVICE, XPUM_POLICY_ACTION_TYPE_NULL

  • condition.threshold (integer) – The threshold for policy

  • condition.type (string) – Policy conditon type. Supported types: XPUM_POLICY_CONDITION_TYPE_GREATER, XPUM_POLICY_CONDITION_TYPE_LESS, XPUM_POLICY_CONDITION_TYPE_WHEN_OCCUR

  • device_id (integer) – Device id

  • notify_callback_url (string) – Policy notify callback url

  • type (string) – Policy type. Supported types: XPUM_POLICY_TYPE_GPU_TEMPERATURE, XPUM_POLICY_TYPE_GPU_MEMORY_TEMPERATURE, XPUM_POLICY_TYPE_GPU_POWER, XPUM_POLICY_TYPE_RAS_ERROR_CAT_RESET, XPUM_POLICY_TYPE_RAS_ERROR_CAT_PROGRAMMING_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_DRIVER_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE, XPUM_POLICY_TYPE_GPU_MISSING, XPUM_POLICY_TYPE_GPU_THROTTLE

Status Codes:
Response JSON Object:
  • message (string) – success or error message

  • status (integer) – status code, 0 is success, other is error.

DELETE /rest/v1/devices/{deviceId}/policy

Delete a policy for a device. The policy type must be set.

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • action.throttle_device_frequency_max (integer) – The throttle_device_frequency_max value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.throttle_device_frequency_min (integer) – The throttle_device_frequency_min value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.type (string) – Policy action type. Supported types: XPUM_POLICY_ACTION_TYPE_THROTTLE_DEVICE, XPUM_POLICY_ACTION_TYPE_NULL

  • condition.threshold (integer) – The threshold for policy

  • condition.type (string) – Policy conditon type. Supported types: XPUM_POLICY_CONDITION_TYPE_GREATER, XPUM_POLICY_CONDITION_TYPE_LESS, XPUM_POLICY_CONDITION_TYPE_WHEN_OCCUR

  • device_id (integer) – Device id

  • notify_callback_url (string) – Policy notify callback url

  • type (string) – Policy type. Supported types: XPUM_POLICY_TYPE_GPU_TEMPERATURE, XPUM_POLICY_TYPE_GPU_MEMORY_TEMPERATURE, XPUM_POLICY_TYPE_GPU_POWER, XPUM_POLICY_TYPE_RAS_ERROR_CAT_RESET, XPUM_POLICY_TYPE_RAS_ERROR_CAT_PROGRAMMING_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_DRIVER_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE, XPUM_POLICY_TYPE_GPU_MISSING, XPUM_POLICY_TYPE_GPU_THROTTLE

Status Codes:
Response JSON Object:
  • message (string) – success or error message

  • status (integer) – status code, 0 is success, other is error.

GET /rest/v1/groups/{groupId}/policy

Get all policies for a group

Parameters:
  • groupId (integer) – Group id

Status Codes:
Response JSON Object:
  • [].device_id (integer) – Device id

  • [].policy_list[].action.throttle_device_frequency_max (integer) – The throttle_device_frequency_max value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • [].policy_list[].action.throttle_device_frequency_min (integer) – The throttle_device_frequency_min value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • [].policy_list[].action.type (string) – Policy action type. Supported types: XPUM_POLICY_ACTION_TYPE_THROTTLE_DEVICE, XPUM_POLICY_ACTION_TYPE_NULL

  • [].policy_list[].condition.threshold (integer) – The threshold for policy

  • [].policy_list[].condition.type (string) – Policy conditon type. Supported types: XPUM_POLICY_CONDITION_TYPE_GREATER, XPUM_POLICY_CONDITION_TYPE_LESS, XPUM_POLICY_CONDITION_TYPE_WHEN_OCCUR

  • [].policy_list[].device_id (integer) – Device id

  • [].policy_list[].notify_callback_url (string) – Policy notify callback url

  • [].policy_list[].type (string) – Policy type. Supported types: XPUM_POLICY_TYPE_GPU_TEMPERATURE, XPUM_POLICY_TYPE_GPU_MEMORY_TEMPERATURE, XPUM_POLICY_TYPE_GPU_POWER, XPUM_POLICY_TYPE_RAS_ERROR_CAT_RESET, XPUM_POLICY_TYPE_RAS_ERROR_CAT_PROGRAMMING_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_DRIVER_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE, XPUM_POLICY_TYPE_GPU_MISSING, XPUM_POLICY_TYPE_GPU_THROTTLE

POST /rest/v1/groups/{groupId}/policy

Set a policy for a group.

Parameters:
  • groupId (integer) – Group id

Request JSON Object:
  • action.throttle_device_frequency_max (integer) – The throttle_device_frequency_max value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.throttle_device_frequency_min (integer) – The throttle_device_frequency_min value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.type (string) – Policy action type. Supported types: XPUM_POLICY_ACTION_TYPE_THROTTLE_DEVICE, XPUM_POLICY_ACTION_TYPE_NULL

  • condition.threshold (integer) – The threshold for policy

  • condition.type (string) – Policy conditon type. Supported types: XPUM_POLICY_CONDITION_TYPE_GREATER, XPUM_POLICY_CONDITION_TYPE_LESS, XPUM_POLICY_CONDITION_TYPE_WHEN_OCCUR

  • device_id (integer) – Device id

  • notify_callback_url (string) – Policy notify callback url

  • type (string) – Policy type. Supported types: XPUM_POLICY_TYPE_GPU_TEMPERATURE, XPUM_POLICY_TYPE_GPU_MEMORY_TEMPERATURE, XPUM_POLICY_TYPE_GPU_POWER, XPUM_POLICY_TYPE_RAS_ERROR_CAT_RESET, XPUM_POLICY_TYPE_RAS_ERROR_CAT_PROGRAMMING_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_DRIVER_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE, XPUM_POLICY_TYPE_GPU_MISSING, XPUM_POLICY_TYPE_GPU_THROTTLE

Status Codes:
Response JSON Object:
  • message (string) – success or error message

  • status (integer) – status code, 0 is success, other is error.

DELETE /rest/v1/groups/{groupId}/policy

Delete a policy for a group. The policy type must be set.

Parameters:
  • groupId (integer) – Group id

Request JSON Object:
  • action.throttle_device_frequency_max (integer) – The throttle_device_frequency_max value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.throttle_device_frequency_min (integer) – The throttle_device_frequency_min value only for POLICY_ACTION_TYPE_THROTTLE_DEVICE action type.

  • action.type (string) – Policy action type. Supported types: XPUM_POLICY_ACTION_TYPE_THROTTLE_DEVICE, XPUM_POLICY_ACTION_TYPE_NULL

  • condition.threshold (integer) – The threshold for policy

  • condition.type (string) – Policy conditon type. Supported types: XPUM_POLICY_CONDITION_TYPE_GREATER, XPUM_POLICY_CONDITION_TYPE_LESS, XPUM_POLICY_CONDITION_TYPE_WHEN_OCCUR

  • device_id (integer) – Device id

  • notify_callback_url (string) – Policy notify callback url

  • type (string) – Policy type. Supported types: XPUM_POLICY_TYPE_GPU_TEMPERATURE, XPUM_POLICY_TYPE_GPU_MEMORY_TEMPERATURE, XPUM_POLICY_TYPE_GPU_POWER, XPUM_POLICY_TYPE_RAS_ERROR_CAT_RESET, XPUM_POLICY_TYPE_RAS_ERROR_CAT_PROGRAMMING_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_DRIVER_ERRORS, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE, XPUM_POLICY_TYPE_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE, XPUM_POLICY_TYPE_GPU_MISSING, XPUM_POLICY_TYPE_GPU_THROTTLE

Status Codes:
Response JSON Object:
  • message (string) – success or error message

  • status (integer) – status code, 0 is success, other is error.

Group Management

POST /rest/v1/groups

Create a new group

Request JSON Object:
  • group_name (string) – The name for the group to be created (required)

Status Codes:
Response JSON Object:
  • device_id_list[] (integer) – The id of devices belong to this group

  • group_id (integer) – The id of the group

  • group_name (string) – The name of the group

GET /rest/v1/groups

Get all groups

Status Codes:
Response JSON Object:
  • group_list[].device_id_list[] (integer) – The id of devices belong to this group

  • group_list[].group_id (integer) – The id of the group

  • group_list[].group_name (string) – The name of the group

GET /rest/v1/groups/{groupId}

Get information of a group

Parameters:
  • groupId (integer) – Group id

Status Codes:
Response JSON Object:
  • device_id_list[] (integer) – The id of devices belong to this group

  • group_id (integer) – The id of the group

  • group_name (string) – The name of the group

POST /rest/v1/groups/{groupId}

Modify a group

Parameters:
  • groupId (integer) – Group id

Request JSON Object:
  • device_id_add[] (integer) – The id of devices add to this group

  • device_id_remove[] (integer) – The id of devices remove from this group

Status Codes:
Response JSON Object:
  • fail_to_add[].device_id (integer) – The id of device failed to be added to or removed from the group

  • fail_to_add[].error_msg (string) – Error message

  • fail_to_remove[].device_id (integer) – The id of device failed to be added to or removed from the group

  • fail_to_remove[].error_msg (string) – Error message

  • group_info.device_id_list[] (integer) – The id of devices belong to this group

  • group_info.group_id (integer) – The id of the group

  • group_info.group_name (string) – The name of the group

DELETE /rest/v1/groups/{groupId}

Delete a group

Status Codes:

Firmware Flash

POST /rest/v1/devices/{deviceId}/updatefw

Run firmware flash on single device or single card

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • file (string) – The path of firmware binary file to flash (required)

  • firmware_name (string) – Firmware name, options are: GFX, GFX_DATA, GFX_CODE_DATA, GFX_PSCBIN

  • force (boolean) – Force GFX firmware update. This parameter only works for GFX firmware.

Status Codes:
Response JSON Object:
  • error (string) – Error message

  • result (string) – The result of the query

GET /rest/v1/devices/{deviceId}/firmware

Get firmware flash state of single device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • firmware_name (string) – Firmware name, options are: GFX, GFX_DATA, GFX_CODE_DATA, GFX_PSCBIN (required)

Status Codes:
Response JSON Object:
  • error (string) – Error message

  • result (string) – Firmware flash state, OK/FAILED/ONGOING

POST /rest/v1/devices/updatefw

Run firmware flash on all devices

Request JSON Object:
  • file (string) – The path of firmware binary file to flash (required)

  • firmware_name (string) – Firmware name, options are: AMC

  • password (string) – Password for redfish auth

  • username (string) – Username for redfish auth

Status Codes:
Response JSON Object:
  • error (string) – Error message

  • result (string) – The result of the query

GET /rest/v1/devices/firmware

Get firmware flash state of all devices

Request JSON Object:
  • firmware_name (string) – Firmware name, options are: AMC (required)

Status Codes:
Response JSON Object:
  • error (string) – Error message

  • result (string) – Firmware flash state, OK/FAILED/ONGOING

Agent Setting

GET /rest/v1/agentSettings

Get XPUM settings

Status Codes:
Response JSON Object:
  • sample_interval (integer) – Agent sample interval, in milliseconds, options are [100, 200, 500, 1000]

POST /rest/v1/agentSettings

Modify XPUM settings

Request JSON Object:
  • sample_interval (integer) – Agent sample interval, in milliseconds, options are [100, 200, 500, 1000]

Status Codes:
Response JSON Object:
  • sample_interval (integer) – Agent sample interval, in milliseconds, options are [100, 200, 500, 1000]

Statistics

GET /rest/v1/devices/{deviceId}/stats

Get statistics by device

Parameters:
  • deviceId (integer) – Device id

Status Codes:
Response JSON Object:
  • begin (string) – The time of last query

  • device_id (integer) – Device id

  • device_level[].avg (integer) – The average value since last query

  • device_level[].max (integer) – The max value since last query

  • device_level[].metrics_type (string) – The metric type

  • device_level[].min (integer) – The min value since last query

  • device_level[].value (integer) – The current value

  • end (string) – The time of this query

  • engine_util (any) – Engine utilizations

  • fabric_throughput[].avg (number) – The average value since last query

  • fabric_throughput[].max (number) – The max value since last query

  • fabric_throughput[].min (number) – The min value since last query

  • fabric_throughput[].name (string) – Fabric throughput name

  • fabric_throughput[].value (number) – The current value

  • tile_level[].data_list[].avg (integer) – The average value since last query

  • tile_level[].data_list[].max (integer) – The max value since last query

  • tile_level[].data_list[].metrics_type (string) – The metric type

  • tile_level[].data_list[].min (integer) – The min value since last query

  • tile_level[].data_list[].value (integer) – The current value

  • tile_level[].engine_util (any) – Engine utilizations

  • tile_level[].tile_id (integer) – The tile this data belongs to

GET /rest/v1/groups/{groupId}/stats

Get statistics by group

Parameters:
  • groupId (integer) – Group id

Status Codes:
Response JSON Object:
  • datas[].begin (string) – The time of last query

  • datas[].device_id (integer) – Device id

  • datas[].device_level[].avg (integer) – The average value since last query

  • datas[].device_level[].max (integer) – The max value since last query

  • datas[].device_level[].metrics_type (string) – The metric type

  • datas[].device_level[].min (integer) – The min value since last query

  • datas[].device_level[].value (integer) – The current value

  • datas[].end (string) – The time of this query

  • datas[].engine_util (any) – Engine utilizations

  • datas[].fabric_throughput[].avg (number) – The average value since last query

  • datas[].fabric_throughput[].max (number) – The max value since last query

  • datas[].fabric_throughput[].min (number) – The min value since last query

  • datas[].fabric_throughput[].name (string) – Fabric throughput name

  • datas[].fabric_throughput[].value (number) – The current value

  • datas[].tile_level[].data_list[].avg (integer) – The average value since last query

  • datas[].tile_level[].data_list[].max (integer) – The max value since last query

  • datas[].tile_level[].data_list[].metrics_type (string) – The metric type

  • datas[].tile_level[].data_list[].min (integer) – The min value since last query

  • datas[].tile_level[].data_list[].value (integer) – The current value

  • datas[].tile_level[].engine_util (any) – Engine utilizations

  • datas[].tile_level[].tile_id (integer) – The tile this data belongs to

  • group_id (integer) – Group id

Config

PUT /rest/v1/devices/{deviceId}/standby

Set standby mode for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • standby_mode (string) – The standby mode: never, default

  • tile_id (integer) – The tile id

Status Codes:
PUT /rest/v1/devices/{deviceId}/powerlimit

Set power limit for device

Parameters:
  • device_id (integer) – Device id

Request JSON Object:
  • power_limit (integer) – The power limit value

Status Codes:
PUT /rest/v1/devices/{deviceId}/frequencyrange

Set frequency range for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • max_frequency (integer) – The max frequency value

  • min_frequency (integer) – The min frequency value

  • tile_id (integer) – The tile id

Status Codes:
PUT /rest/v1/devices/{deviceId}/scheduler

Set scheduler mode for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • scheduler_mode (string) – The scheduler mode: timeout, timeslice, exclusive and debug

  • scheduler_timeslice_interval (integer) – The interval for timeslice mode

  • scheduler_timeslice_yield_timeout (integer) – The yield timeout for timeslice mode

  • scheduler_watchdog_timeout (integer) – The watchdog timeout for timeout mode

  • tile_id (integer) – The tile id

Status Codes:
GET /rest/v1/devices/{deviceId}/config

Get all configuration for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • tile_id (integer) – The tile id

Status Codes:
Response JSON Object:
  • deviceId (integer) – Device id

  • memory_ecc_current_state (string) – The current state of memory ecc

  • memory_ecc_pending_state (string) – The pending state of memory ecc

  • power_limit (integer) – The power limit value

  • power_vaild_range (string) – power’s scope

  • tileConfigData.gpu_frequency_valid_options (string) – frequency scope

  • tileConfigData.max_frequency (integer) – max frequency

  • tileConfigData.min_frequency (integer) – min frequency

  • tileConfigData.scheduler_mode (string) – The scheduler mode: timeout, timeslice, exclusive and debug

  • tileConfigData.scheduler_timeslice_interval (integer) – scheduler timeslice’s interval value

  • tileConfigData.scheduler_timeslice_yield_timeout (integer) – scheduler timeslice’s yield value

  • tileConfigData.scheduler_watchdog_timeout (integer) – scheduler timeout’s value

  • tileConfigData.standby_mode (string) – The standby mode: never, default

  • tileConfigData.standby_mode_valid_options (string) – standby option

  • tileConfigData.tileId (string) – Tile id

PUT /rest/v1/devices/{deviceId}/performancefactor

Set performance factor for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • engine (string) – engine name

  • factor (number) – performance factor

  • tile_id (integer) – The tile id

Status Codes:
PUT /rest/v1/devices/{deviceId}/reset

Reset the device

Parameters:
  • deviceId (integer) – Device id

Status Codes:
PUT /rest/v1/devices/{deviceId}/ppr

Apply PPR to the device

Parameters:
  • deviceId (integer) – Device id

Status Codes:
PUT /rest/v1/devices/{deviceId}/portenabled

Set port enabled for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • enabled (integer) – The enabled 1; disabled 0

  • port (integer) – The port number

  • tile_id (integer) – The tile id

Status Codes:
PUT /rest/v1/devices/{deviceId}/portbeaconing

Set port beaconing for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • beaconing (integer) – The beaconing on 1; off 0

  • port (integer) – The port number

  • tile_id (integer) – The tile id

Status Codes:
PUT /rest/v1/devices/{deviceId}/memoryecc

Set memory ecc state for device

Parameters:
  • deviceId (integer) – Device id

Request JSON Object:
  • enabled (integer) – The enabled 1; disabled 0

Status Codes:

Topology

GET /rest/v1/devices/{deviceId}/topology

Get device topology

Parameters:
  • deviceId (integer) – Device id

Status Codes:
Response JSON Object:
  • affinity_localcpulist (string) – local cpu list

  • affinity_localcpus (string) – local cpus

  • device_id (integer) – Device id

  • switch_count (integer) – Device parent switch count

  • switch_list[] (string) – list of switch device path

GET /rest/v1/topology

Export node topology

Status Codes:
Response JSON Object:
  • length (integer) – XML buffer length

  • xmlstring (string) – XML sting of node topology

Get xelink topology

Status Codes:
Response JSON Object:
  • link_type (string) – link type

  • local_cpu_affinity (string) – cpu affinity

  • local_device_id (integer) – Device id

  • local_numa_index (integer) – NUMA node index

  • local_on_subdevice (boolean) – if xelink port is located on a sub-device

  • local_subdevice_id (integer) – sub-device id

  • port_list[] (integer) – port list link to remote device

  • remote_device_id (integer) – remote Device id

  • remote_subdevice_id (integer) – remote sub-device id

ps

GET /rest/v1/devices/{deviceId}/ps

Get per process device utilization.

Parameters:
  • deviceId (integer) – Device id

Status Codes:
Response JSON Object:
  • utils[].device_id (integer) – Device ID

  • utils[].mem_size (integer) – Memory size

  • utils[].process_id (integer) – Process ID

  • utils[].process_name (string) – Process Name

  • utils[].shared_mem_size (integer) – Shared memory size

GET /rest/v1/ps

Get per process device utilization.

Status Codes:
Response JSON Object:
  • utils[].device_id (integer) – Device ID

  • utils[].mem_size (integer) – Memory size

  • utils[].process_id (integer) – Process ID

  • utils[].process_name (string) – Process Name

  • utils[].shared_mem_size (integer) – Shared memory size

Dump Raw Data

POST /rest/v1/dump

Start dump raw data task

Request JSON Object:
  • device_id (integer) – The device to dump raw data (required)

  • metrics_type_list[] (string) – The metrics type to dump, options are: XPUM_DUMP_GPU_UTILIZATION XPUM_DUMP_POWER XPUM_DUMP_GPU_FREQUENCY XPUM_DUMP_GPU_CORE_TEMPERATURE XPUM_DUMP_MEMORY_TEMPERATURE XPUM_DUMP_MEMORY_UTILIZATION XPUM_DUMP_MEMORY_READ_THROUGHPUT XPUM_DUMP_MEMORY_WRITE_THROUGHPUT XPUM_DUMP_ENERGY XPUM_DUMP_EU_ACTIVE XPUM_DUMP_EU_STALL XPUM_DUMP_EU_IDLE XPUM_DUMP_RAS_ERROR_CAT_RESET XPUM_DUMP_RAS_ERROR_CAT_PROGRAMMING_ERRORS XPUM_DUMP_RAS_ERROR_CAT_DRIVER_ERRORS XPUM_DUMP_RAS_ERROR_CAT_CACHE_ERRORS_CORRECTABLE XPUM_DUMP_RAS_ERROR_CAT_CACHE_ERRORS_UNCORRECTABLE XPUM_DUMP_MEMORY_BANDWIDTH XPUM_DUMP_MEMORY_USED XPUM_DUMP_PCIE_READ_THROUGHPUT XPUM_DUMP_PCIE_WRITE_THROUGHPUT XPUM_DUMP_COMPUTE_XE_LINK_THROUGHPUT XPUM_DUMP_COMPUTE_ENGINE_UTILIZATION XPUM_DUMP_RENDER_ENGINE_UTILIZATION XPUM_DUMP_DECODE_ENGINE_UTILIZATION XPUM_DUMP_ENCODE_ENGINE_UTILIZATION XPUM_DUMP_COPY_ENGINE_UTILIZATION XPUM_DUMP_MEDIA_ENHANCEMENT_ENGINE_UTILIZATION XPUM_DUMP_3D_ENGINE_UTILIZATION XPUM_DUMP_RAS_ERROR_CAT_NON_COMPUTE_ERRORS_CORRECTABLE XPUM_DUMP_RAS_ERROR_CAT_NON_COMPUTE_ERRORS_UNCORRECTABLE XPUM_DUMP_COMPUTE_ENGINE_GROUP_UTILIZATION XPUM_DUMP_RENDER_ENGINE_GROUP_UTILIZATION XPUM_DUMP_MEDIA_ENGINE_GROUP_UTILIZATION XPUM_DUMP_COPY_ENGINE_GROUP_UTILIZATION XPUM_DUMP_FREQUENCY_THROTTLE_REASON_GPU XPUM_DUMP_MEDIA_ENGINE_FREQUENCY

  • show_date (boolean) – Controls timestamp format in dumps: ‘1’ includes full date and time, ‘0’ (default) includes only time.

  • tile_id (integer) – The tile to dump raw data

Status Codes:
Response JSON Object:
  • dump_file_path (string) – The path to file of dumped data

  • task_id (integer) – The task id

GET /rest/v1/dump

List all dump raw data task

Status Codes:
Response JSON Object:
  • dump_task_ids[] (integer) – The id list of all tasks

DELETE /rest/v1/dump/{taskId}

Stop dump raw data task

Parameters:
  • taskId (integer) – the dump raw data task id

Status Codes:
Response JSON Object:
  • dump_file_path (string) – The path to file of dumped data

  • task_id (integer) – The task id

Sensor

GET /rest/v1/sensor

Get sensor reading

Status Codes:
Response JSON Object:
  • sensor_reading[].amc_index (number) – AMC index

  • sensor_reading[].sensor_high (number) – High bound of sensor reading

  • sensor_reading[].sensor_low (number) – Low bound of sensor reading

  • sensor_reading[].sensor_name (string) – Sensor name

  • sensor_reading[].sensor_unit (string) – Sensor unit

  • sensor_reading[].value (number) – Sensor reading value

vgpu

GET /rest/v1/vgpu/precheck

Check if BIOS settings are ready to create virtual GPUs

Status Codes:
Response JSON Object:
  • iommu_message (string) – IOMMU message

  • iommu_status (string) – IOMMU status

  • sriov_message (string) – SR-IOV message

  • stiov_status (string) – SR-IOV status

  • vmx_flag (string) – VMX Flag Check

  • vmx_message (string) – VMX flag message

GET /rest/v1/devices/{deviceId}/vgpustats

Get statistics data of all virtual GPUs

Status Codes:
Response JSON Object:
  • vf_list[].bdf_address (string) – BDF Address

  • vf_list[].metric_list[].metric_type (integer) – Metric Type

  • vf_list[].metric_list[].scale (integer) – Scale

  • vf_list[].metric_list[].value (integer) – Value