Python module
engine
The APIs in this module allow you to run inference with MAX Engine—a graph compiler and runtime that accelerates your AI models on a wide variety of hardware.
InferenceSession
class max.engine.InferenceSession(num_threads=None, devices=None, *, custom_extensions=None)
Manages an inference session in which you can load and run models.
You need an instance of this to load a model as a Model
object.
For example:
session = engine.InferenceSession()
model_path = Path('bert-base-uncased')
model = session.load(model_path)
session = engine.InferenceSession()
model_path = Path('bert-base-uncased')
model = session.load(model_path)
-
Parameters:
-
- num_threads (
int
|
None
) – Number of threads to use for the inference session. This defaults to the number of physical cores on your machine. - devices (
Iterable
[
Device
]
|
None
) – A list of devices on which to run inference. Default is the host CPU only. - custom_extensions (
CustomExtensionsType
|
None
) – The extensions to load for the model. Supports paths to a .mojopkg custom ops library or a .mojo source file.
- num_threads (
devices
A list of available devices.
gpu_profiling()
gpu_profiling(mode)
Enables end to end gpu profiling configuration.
-
Parameters:
-
mode (
GPUProfilingMode
)
load()
load(model, *, custom_extensions=None, custom_ops_path=None, weights_registry=None)
Loads a trained model and compiles it for inference.
-
Parameters:
-
- model (
Union
[
str
,
Path
,
Any
]
) – Path to a model. - custom_extensions (
CustomExtensionsType
|
None
) – The extensions to load for the model. Supports paths to .mojopkg custom ops. - custom_ops_path (
str
|
None
) – The path to your custom ops Mojo package. Deprecated, usecustom_extensions
instead. - weights_registry (
Mapping
[
str
,
DLPackCompatible
]
|
None
) – A mapping from names of model weights’ names to their values. The values are currently expected to be dlpack arrays. If an array is a read-only numpy array, the user must ensure that its lifetime extends beyond the lifetime of the model.
- model (
-
Returns:
-
The loaded model, compiled and ready to execute.
-
Raises:
-
RuntimeError – If the path provided is invalid.
-
Return type:
set_mojo_assert_level()
set_mojo_assert_level(level)
Sets which mojo asserts are kept in the compiled model.
-
Parameters:
-
level (
str
|
AssertLevel
)
set_mojo_log_level()
set_mojo_log_level(level)
Sets the verbosity of mojo logging in the compiled model.
set_split_k_reduction_precision()
set_split_k_reduction_precision(precision)
Sets the accumulation precision for split k reductions in large matmuls.
-
Parameters:
-
precision (
str
|
SplitKReductionPrecision
)
Model
class max.engine.Model
A loaded model that you can execute.
Do not instantiate this class directly. Instead, create it with
InferenceSession
.
__call__()
__call__(*args, **kwargs)
Call self as a function.
-
Parameters:
-
Return type:
execute()
execute(*args)
input_metadata
property input_metadata
Metadata about the model’s input tensors, as a list of
TensorSpec
objects.
For example, you can print the input tensor names, shapes, and dtypes:
for tensor in model.input_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')
for tensor in model.input_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')
output_metadata
property output_metadata
Metadata about the model’s output tensors, as a list of
TensorSpec
objects.
For example, you can print the output tensor names, shapes, and dtypes:
for tensor in model.output_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')
for tensor in model.output_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')
GPUProfilingMode
class max.engine.GPUProfilingMode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
The supported modes for GPU profiling.
DETAILED
DETAILED = 'detailed'
OFF
OFF = 'off'
ON
ON = 'on'
LogLevel
class max.engine.LogLevel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)
Internal use.
CRITICAL
CRITICAL = 'critical'
DEBUG
DEBUG = 'debug'
ERROR
ERROR = 'error'
INFO
INFO = 'info'
NOTSET
NOTSET = 'notset'
WARNING
WARNING = 'warning'
MojoValue
class max.engine.MojoValue
This is work in progress and you should ignore it for now.
TensorSpec
class max.engine.TensorSpec(self, shape: collections.abc.Sequence[int | None] | None, dtype: max._core.dtype.DType, name: str)
Defines the properties of a tensor, including its name, shape and data type.
For usage examples, see Model.input_metadata
.
-
Parameters:
-
- shape – The tensor shape.
- dtype – The tensor data type.
- name – The tensor name.
dtype
property dtype
A tensor data type.
name
property name
A tensor name.
shape
property shape
The shape of the tensor as a list of integers.
If a dimension size is unknown/dynamic (such as the batch size), its
value is None
.
CustomExtensionsType
max.engine.CustomExtensionsType
Was this page helpful?
Thank you! We'll create more content like this.
Thank you for helping us improve!