Skip to content

Latest commit

 

History

History
60 lines (42 loc) · 1.96 KB

File metadata and controls

60 lines (42 loc) · 1.96 KB

⚠️ Notice: Limited Maintenance

This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.

TorchServe inference with torch._export.aot_compile

This example shows how to run TorchServe with Torch exported model with AOTInductor

To understand when to use torch._export.aot_compile, please refer to this section

Pre-requisites

  • PyTorch >= 2.3.0
  • CUDA >= 11.8

Change directory to the examples directory Ex: cd examples/pt2/torch_export_aot_compile

Create a Torch exported model with AOTInductor

The model is saved with .so extension Here we are torch exporting with AOT Inductor with max_autotune mode. This is also making use of dynamic_shapes to support batch size from 1 to 32. In the code, the min batch_size is mentioned as 2 instead of 1. Its by design. The code works for batch size 1. You can find an explanation for this here

python resnet18_torch_export.py

Create model archive

torch-model-archiver --model-name res18-pt2 --handler image_classifier --version 1.0 --serialized-file resnet18_pt2.so --config-file model-config.yaml --extra-files ../../image_classifier/index_to_name.json
mkdir model_store
mv res18-pt2.mar model_store/.

Start TorchServe

torchserve --start --model-store model_store --models res18-pt2=res18-pt2.mar --ncs --disable-token-auth  --enable-model-api

Run Inference

curl http://127.0.0.1:8080/predictions/res18-pt2 -T ../../image_classifier/kitten.jpg

produces the output

{
  "tabby": 0.4087875485420227,
  "tiger_cat": 0.34661102294921875,
  "Egyptian_cat": 0.13007202744483948,
  "lynx": 0.024034621194005013,
  "bucket": 0.011633828282356262
}