Skip to main content

Python module

engine

The APIs in this module allow you to run inference with MAX Engine—a graph compiler and runtime that accelerates your AI models on a wide variety of hardware.

InferenceSession

class max.engine.InferenceSession(num_threads=None, devices=None, *, custom_extensions=None)

Manages an inference session in which you can load and run models.

You need an instance of this to load a model as a Model object. For example:

session = engine.InferenceSession()
model_path = Path('bert-base-uncased')
model = session.load(model_path)
session = engine.InferenceSession()
model_path = Path('bert-base-uncased')
model = session.load(model_path)

Parameters:

  • num_threads (int | None ) – Number of threads to use for the inference session. This defaults to the number of physical cores on your machine.
  • devices (Iterable [ Device ] | None ) – A list of devices on which to run inference. Default is the host CPU only.
  • custom_extensions (CustomExtensionsType | None ) – The extensions to load for the model. Supports paths to a .mojopkg custom ops library or a .mojo source file.

devices

property devices: list[Device]

A list of available devices.

gpu_profiling()

gpu_profiling(mode)

Enables end to end gpu profiling configuration.

Parameters:

mode (GPUProfilingMode )

load()

load(model, *, custom_extensions=None, custom_ops_path=None, weights_registry=None)

Loads a trained model and compiles it for inference.

Parameters:

  • model (Union [ str , Path , Any ] ) – Path to a model.
  • custom_extensions (CustomExtensionsType | None ) – The extensions to load for the model. Supports paths to .mojopkg custom ops.
  • custom_ops_path (str | None ) – The path to your custom ops Mojo package. Deprecated, use custom_extensions instead.
  • weights_registry (Mapping [ str , DLPackCompatible ] | None ) – A mapping from names of model weights’ names to their values. The values are currently expected to be dlpack arrays. If an array is a read-only numpy array, the user must ensure that its lifetime extends beyond the lifetime of the model.

Returns:

The loaded model, compiled and ready to execute.

Raises:

RuntimeError – If the path provided is invalid.

Return type:

Model

set_mojo_assert_level()

set_mojo_assert_level(level)

Sets which mojo asserts are kept in the compiled model.

Parameters:

level (str | AssertLevel )

set_mojo_log_level()

set_mojo_log_level(level)

Sets the verbosity of mojo logging in the compiled model.

Parameters:

level (str | LogLevel )

set_split_k_reduction_precision()

set_split_k_reduction_precision(precision)

Sets the accumulation precision for split k reductions in large matmuls.

Parameters:

precision (str | SplitKReductionPrecision )

Model

class max.engine.Model

A loaded model that you can execute.

Do not instantiate this class directly. Instead, create it with InferenceSession.

__call__()

__call__(*args, **kwargs)

Call self as a function.

Parameters:

Return type:

list[Tensor | MojoValue]

execute()

execute(*args)

Parameters:

Return type:

list[Tensor | MojoValue]

input_metadata

property input_metadata

Metadata about the model’s input tensors, as a list of TensorSpec objects.

For example, you can print the input tensor names, shapes, and dtypes:

for tensor in model.input_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')
for tensor in model.input_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')

output_metadata

property output_metadata

Metadata about the model’s output tensors, as a list of TensorSpec objects.

For example, you can print the output tensor names, shapes, and dtypes:

for tensor in model.output_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')
for tensor in model.output_metadata:
print(f'name: {tensor.name}, shape: {tensor.shape}, dtype: {tensor.dtype}')

GPUProfilingMode

class max.engine.GPUProfilingMode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

The supported modes for GPU profiling.

DETAILED

DETAILED = 'detailed'

OFF

OFF = 'off'

ON

ON = 'on'

LogLevel

class max.engine.LogLevel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Internal use.

CRITICAL

CRITICAL = 'critical'

DEBUG

DEBUG = 'debug'

ERROR

ERROR = 'error'

INFO

INFO = 'info'

NOTSET

NOTSET = 'notset'

WARNING

WARNING = 'warning'

MojoValue

class max.engine.MojoValue

This is work in progress and you should ignore it for now.

TensorSpec

class max.engine.TensorSpec(self, shape: collections.abc.Sequence[int | None] | None, dtype: max._core.dtype.DType, name: str)

Defines the properties of a tensor, including its name, shape and data type.

For usage examples, see Model.input_metadata.

Parameters:

  • shape – The tensor shape.
  • dtype – The tensor data type.
  • name – The tensor name.

dtype

property dtype

A tensor data type.

name

property name

A tensor name.

shape

property shape

The shape of the tensor as a list of integers.

If a dimension size is unknown/dynamic (such as the batch size), its value is None.

CustomExtensionsType

max.engine.CustomExtensionsType

alias of list[str | Path | Any] | str | Path | Any