⚠️ Notice: Limited Maintenance

This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.

TorchServe inference with torch._export.aot_compile

This example shows how to run TorchServe with Torch exported model with AOTInductor

To understand when to use torch._export.aot_compile, please refer to this section

Pre-requisites

PyTorch >= 2.3.0
CUDA >= 11.8

Change directory to the examples directory Ex: cd examples/pt2/torch_export_aot_compile

Create a Torch exported model with AOTInductor

The model is saved with .so extension Here we are torch exporting with AOT Inductor with max_autotune mode. This is also making use of dynamic_shapes to support batch size from 1 to 32. In the code, the min batch_size is mentioned as 2 instead of 1. Its by design. The code works for batch size 1. You can find an explanation for this here

python resnet18_torch_export.py

Create model archive

torch-model-archiver --model-name res18-pt2 --handler image_classifier --version 1.0 --serialized-file resnet18_pt2.so --config-file model-config.yaml --extra-files ../../image_classifier/index_to_name.json
mkdir model_store
mv res18-pt2.mar model_store/.

Start TorchServe

torchserve --start --model-store model_store --models res18-pt2=res18-pt2.mar --ncs --disable-token-auth  --enable-model-api

Run Inference

curl http://127.0.0.1:8080/predictions/res18-pt2 -T ../../image_classifier/kitten.jpg

produces the output

{
  "tabby": 0.4087875485420227,
  "tiger_cat": 0.34661102294921875,
  "Egyptian_cat": 0.13007202744483948,
  "lynx": 0.024034621194005013,
  "bucket": 0.011633828282356262
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

⚠️ Notice: Limited Maintenance

TorchServe inference with torch._export.aot_compile

Pre-requisites

Create a Torch exported model with AOTInductor

Create model archive

Start TorchServe

Run Inference

Files

README.md

Latest commit

History

README.md

File metadata and controls

⚠️ Notice: Limited Maintenance

TorchServe inference with torch._export.aot_compile

Pre-requisites

Create a Torch exported model with AOTInductor

Create model archive

Start TorchServe

Run Inference