This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.
This example shows how to run TorchServe with Torch exported model with AOTInductor
To understand when to use torch._export.aot_compile
, please refer to this section
PyTorch >= 2.3.0
CUDA >= 11.8
Change directory to the examples directory
Ex: cd examples/pt2/torch_export_aot_compile
The model is saved with .so
extension
Here we are torch exporting with AOT Inductor with max_autotune
mode.
This is also making use of dynamic_shapes
to support batch size from 1 to 32.
In the code, the min batch_size is mentioned as 2 instead of 1. Its by design. The code works for batch size 1. You can find an explanation for this here
python resnet18_torch_export.py
torch-model-archiver --model-name res18-pt2 --handler image_classifier --version 1.0 --serialized-file resnet18_pt2.so --config-file model-config.yaml --extra-files ../../image_classifier/index_to_name.json
mkdir model_store
mv res18-pt2.mar model_store/.
torchserve --start --model-store model_store --models res18-pt2=res18-pt2.mar --ncs --disable-token-auth --enable-model-api
curl http://127.0.0.1:8080/predictions/res18-pt2 -T ../../image_classifier/kitten.jpg
produces the output
{
"tabby": 0.4087875485420227,
"tiger_cat": 0.34661102294921875,
"Egyptian_cat": 0.13007202744483948,
"lynx": 0.024034621194005013,
"bucket": 0.011633828282356262
}