Lenovo YUM Repository
Release and Change History








Bundle Release Information:
Targeted Server Family (Machine Type): ThinkSystem SE350 (7Z46, 7D1X, 7D27)
Targeted Operating System: SLES15SP5
YUM Repository Build Date: 2024_01_16

YUM Repository Release Version: SIAgile23-1

Device:
ThinkSystem Qualcomm Cloud AI 100 Part Number: 4X67A84009
Feature Code: AUKP
PCIe Sub Vendor ID: 0x17CB
PCIe Sub Device ID: 0xA100


Release History:

Release Change History Documents

Qualcomm Co-processor Utility



1 Introduction
1.1 Purpose
The purpose of this release is to provide software to support customers that receive the QualcommCloud AI 100 Cards, which utilize the AIC100 SoC.
This document provides information on the Cloud AI 100 software for the 1.3.0 Release.
1.2 Scope
This is a code release of software for the Cloud AI 100 cards. The Apps SDK and Platform SDK build version numbers included in this release are listed in Table 1-1.
Table 1-1 Component Build Versions
SDK
Version
Apps SDK
1.3.108
Platform SDK
1.3.108
1.3 Conventions
Function declarations, function names, type declarations, and code samples appear in a different font, for example: #include.
Code variables appear in angle brackets, for example: .
Commands to be entered appear in a different font, for example: copy a:*.* b:
Button and key names appear in bold font, for example: click Save or press Enter.
Keys that are pressed in combination are indicated with a plus sign, for example: press CTRL+C.
Parameter types are indicated by arrows:
→
Designates an input parameter
←
Designates an output parameter
.
Designates a parameter used for both input and output
1.4 Release availability
The software packages and supporting documentation associated with this release are made available via the following mechanism:
. CreatePoint –Software is provided via the Qualcomm CreatePoint web portal.
1.5 References
Table 1-2 References
Ref
Title
Document ID
Qualcomm
Q1
Qualcomm Cloud AI 100 Platform SDK User Guide
80-PT790-31
Q2
Qualcomm Cloud AI 100 Apps SDK User Guide
80-PT790-30
Q3
Qualcomm Cloud AI 100 Linux Library API Reference
80-PT790-36
Q4
Qualcomm Cloud AI 100 Model Commands
KBA-201203162533
Q5
Qualcomm Cloud AI 100 Compiler Library API Reference
80-PT790-37

2 Features

2.1 Existing Features 2
2.1.1 Platform SDK 3
2.1.1.1 Runtime Application 4
Supports C-99 compliance. The library header QAicApi.h may now be included in a standard C-5 99 application. In prior releases, a C++ compliant compiler was required. The user may use either 6 a C-99 or a C++99 compiler or later. Several APIs have been updated to ensure this support. 7
Supports enable/disable of correctable error monitoring through the QMonitor interface. 8
Supports Debian variants for compatibility with Ubuntu 18.04. 9
Docker support for CentOS builds. All installed AIC100 devices on the platform are shared with 10 all the containers. Multiple applications can run simultaneously within one Docker container. 11 Multiple containers can work simultaneously with a single AIC100 device. 12
Supports QAic Program Container (QPC) utility which allows a more generic way of packing the 13 network artifacts that the compiler passes to the Runtime. 14
Supports qaic-runner utility that uses QAic Runtime APIs. qaic_test_app is deprecated.
Supports multi-thread queue processing. Each QAicQueue created may be single-threaded or 16 multi-threaded depending on the configuration. Multi-threaded queues mean that more than one 17 thread of execution is created to handle pre-processing and submission to hardware. The number 18 of threads is configurable in the QAicQueue creation properties and cannot be changed after the 19 queue is created. Users should select single-threaded queues if in-order execution is required. 20 When multi-threaded queues are enabled, it is possible that the ExecObj complete signals may 21 come out-of-order. 22
Supports Synchronous and Asynchronous operations, as defined in the QAicApi.h header. The 23 QAic API provides support for creating the following objects: 24
. QAicContext, created with the qaicCreateContext API, is used to create a context that 25 contains one or multiple devices. 26
. QAicConstants, created with the qaicCreateConstants API, is used to create an object 27 representing the constant data used in programs. 28
. QAicProgram, created with the qaicCreateProgram API, is used to create a program object. 29
. QAicEvent, created with the qaicCreateEvent API, is used to create an event for the 30 synchronization of asynchronous operations.

QAicQueue, created with the qaicCreateQueue API, is used to create a queue on which 1 asynchronous operations may be enqueued. 2
. QAicExecObj, created with the qaicCreateExecObj API, is used to create execution objects, 3 objects which link a QAicProgram with constants (QAicConstants) and internally allocated 4 and configured resources necessary to execute an inference. The created qaicExecObj is 5 enqueued onto a QAicQueue. 6
. QAicProfilingHandle, created with the qaicCreateProfilingHandle API, is used to create 7 profiling objects which track profiling data generated by a QAicProgram with profiling built 8 into the network. 9
. Synchronization APIs using QAicEvents are provided, such as qaicFlushQueue and 10 qaicWaitforEvent. 11
Supports Oversubscription: Oversubscription allows the user to create a ProgramGroup through 12 the API qaicCreateProgramGroup and add some programs that will share a set of resources. 13
For example: 14
. qaicCreateProgramGroup 15
. qaicAddProgram (call this multiple times to add programs that each use all the device 16 resources) 17
. create QAicExecObj and enqueue any of the created objects.
The driver performs all the operations necessary (load, activate, deactivate) to enable the 19 programs to run sequentially. 20
See the Qualcomm Cloud AI 100 Platform SDK User Guide [Q1] and the examples for full details 21 on how to use the new QAic API. 22
Supports the client-server model of QAicMonitor through GRPC. The QAicMonitor server is 23 started by running /qaic_monitor_grpc_server in the /opt/qti-aic/tools directory. 24 Multiple clients may connect to the QAicMonitor server, see the Qualcomm Cloud AI 100 25 Platform SDK User Guide [Q1] for usage details. 26
QAicMonitor provides functions to configure and monitor the AIC100 device. The API is defined 27 in the QAicMonitor.h header and uses a protocol buffer interface. The protocol buffer interface is 28 provided in source form: QAicMonitor.proto and in pre-generated C++ header form: 29 QAicMonitor.pb.h. 30
QAicMonitor.proto is a preliminary interface. 31
QAicMonitor supports the following functionality: 32
. RAS PCIe monitoring, ability to configure, read, and reset PCIe event counters. 33
. RAS IMEM; however, this functionality is a prototype as a new hardware revision is needed 34 for support. The interface is provided for prototyping. 35
. RAS NSP, ability to retrieve correctable and non-correctable error information from the 36 Neural Processor’s internal memory.
. RAS SysMon, this is a prototype API only in this release, it does not provide the full 38 functionality. 39
. Ulog, ability to configure and read AIC100 devices logs.
. Device Loopback, ability to send a test message string to AIC100 over the protocol buffer 1 interface and receive the same message back for validation. 2
. Device Info, to retrieve the device configuration including software versions, frequencies, 3 and resource utilization. 4
. Supports multiple PCIe connected devices. 5
. Supports asynchronous notifications of QSM and NSP events to clients in the form of a 6 callback API in the driver. 7
. Supports the collection of logs from multiple devices simultaneously using the qaic_log 8 utility. 9
. Supports the configuration of logs as per log source. 10
. Supports collecting PCIe time-based analysis performance data. This provides the ability to 11 start, stop, and collect time-based analysis data.

Supports qaic_log utility usage of the QAicMonitor subsystem. The qaic_log utility presents a 13 new and improved set of options and capabilities to read logs. See qaic_log -h for help. 14
Supports qaic_monitor_json. This utility allows users to compose protocol buffers in JSON 15 format and send them to the Monitor for processing. 16
Virtual Machine support. Users can run inference inside a Virtual Machine, bypassing the 17 AIC100 card from host to Virtual Machine. 18
Supports opstats in qaic-exec/qaic-runner. 19
Supports skipping transformation on input data. For example. The user can skip the quantize step 20 if the input data is already in a quantized format. 21
Supports setting the error reporting threshold (using QAic Monitor API) and enabling/disabling 22 the reporting of an error (using QAic Monitor PVS variable). 23
Supports reading and writing the PVS variable using QAic Monitor. 24
Provides enhanced support to validate the QAic Program Container (aka QPC). It validates QPC 25 compatibility with AIC100 hardware and checks the integrity of QPC. It displays the basic 26 information of network and its descriptor. 27
Supports setting the Log Level per Context.

2.1.1.2 Runtime Driver 29
The Linux kernel driver module version was updated to 11.5.12. 30
The Runtime API version is shown in the QAicApi.h header as follows: 31
#define LRT_LIB_MAJOR_VERSION 32
#define LRT_LIB_MINOR_VERSION 33
The application using QAicApi.h should compare these values against the runtime library to 34 ensure that the application created with the API is compatible with the runtime library being used. 35
The Runtime library version is obtained through the QRTVerInfo structure, which is part of 36 QAicInfo, and this information is retrieved through the qaicGetAicVersion API. 37
QStatus qaicGetAicVersion(uint16_t *major, uint16_t *minor, const char 38 **patch, const char **variant);

The following compatibility verification should be made by the application: 1
qaicGetAicVersion: major == LRT_LIB_MAJOR_VERSION 2
qaicGetAicVersion: minor >= LRT_LIB_MINOR_VERSION 3
The minor version in the runtime library may be higher, which indicates that new features are 4 available but not visible to the application. The major version must match. If the major version 5 does not match, the behavior of the application is undefined. 6
The Linux kernel driver module exports device telemetry information via the hwmon sysfs API. 7
SoC reset is implemented and may be triggered via sysfs. This is an implementation defined 8 “function-level reset” since AIC100 does not support the PCIe defined Function-Level Reset 9 (FLR). Invoking this reset will reset the SoC and cause it to go through the boot sequence again. 10 This will not take down the link. 11
Supports creating and enqueuing a DMA Buffer directly to bypass host buffer processing. The 12 host driver, when supplied with user-buffers, will translate, convert, and copy data into DMA 13 buffers. If the system supports allocation of DMA buffers by the user process and the application 14 can pre-convert the data to the DMA data format, the user may create an execObj using 15 qaicCreateExecObjDmaBuf. See QAicApi.h and the Qualcomm Cloud AI 100 Linux Library API 16 Reference [Q3] for more details.

Basic buffer IO information can be obtained with qaicProgramGetIoBufferInfo. 18
qaicProgramGetIoDescriptor provides a protocol buffer defined in QAicApi.proto. It provides 19 detailed information about the input and output data format, including the information necessary 20 to compose DMA ready buffers.

2.1.1.3 Firmware 22
Support for the query of Fused Public Keys. 23
Support for SPBL/PHY parameter upgrade/reprogram from host. 24
Power Limits Management is supported for the PCIe form factor of AIC100. 25
Firmware uses global SoC time as timestamps in the logs. 26
The warning and throttling thermal thresholds are 88 C and 90 C respectively for the PCIe form 27 factor. 28
PCIe form factor supports 66 W TDP. 29
Supports power budget with configurable 15/20/25 W TDP (Dual M.2). 30
Supports power telemetry through both MHI and SMBus. 31
Supports hardware watchdog. 32
Crashdump support is enabled. 33
Support for NVME-MI Basic Management Command over SMBus with vendor defined data 34 blocks. 35
Support for power gate booting process beyond thermal thresholds. 36
Support for QDSS logging.

2.1.1.4 Tools 1
The following tools are available as part of the Platform SDK release: 2
. PowerStress Tool 3
. PCIe Margining Tool 4
Support for the PowerStress Tool to provide a score for workloads. 5
2.1.1.5 Docker / Kubernetes 6
The QAic Device Plugin for Kubernetes allows a Kubernetes master to manage AIC100 devices 7 in a cluster. The device plugin provides the following functionality: 8
. Support for passing partial devices to containers. 9
. Creation of docker images tested for both aarch64 and x86_64 using the x86_64 toolchain. 10
. Reports the number of AIC100 devices on each node in a cluster. 11
. Monitors the health status of AIC100 devices. 12
. Runs containers with AIC100 devices. 13
. Sharing of devices across containers (first-come first-serve).





Lenovo Data Center Group Linux OS Support Home Page
linux.lenovo.com

© 2018-2024 Lenovo. All rights reserved