rhoai-kserve-from-pvc

Open Data Hub and Red Hat OpenShift AI both use Kserve as the infrastructure for serving models.

The default setup is that Kserve is configured to pull the models from an s3 compatible store (e.g AWS S3, IBM COS, Minio, Red Hat MultiCloudGateway etc.).

The process that occurs is that when Kserve initializes InferenceService runtime's pod. An init container downloads the model into an emptyDir volume mount.

This enables the inference server to exploit the peformance of locally attached storage if the server crashes as needs to restart.

However, some environments are constrained:

There is insufficient local storage to cache the model on local disk.
- Filling local storage on nodes can have adverse side effects including thrashing as the node tries to eject enough workload to obtain the minimum required free space
S3 is not universal on-premise.
- While services such as Minio or NooBaa Multi Cloud Gateway can be deployed on the cluster, in the end they will then be consuming from PVCs.

As an alternative this repository documents the process for using a PVC for storage of a model and therefore avoiding requiring s3 and local emptyDir volume mounts

Caveats

At this stage the repository provides an example of doing so not a customized pipeline.
- Minimal templating has been used.
The example was built on a Red Hat OpenShift on AWS cluster. There are references to AWS which may not apply (such as gp3 st)
This has only being tested with a "Single Model Server" such as vLLM

Ambitions

Convert the process into a templated workflow.

Process

Once per cluster

In order to serve a model from a PVC the storage class must have a RECLAIMPOLICY of Retain.

In the case of ths cluster none of the storage classes supported the appropriate reclaim policy (see below):

oc get sc

NAME             PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2              kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   43h
gp2-csi          ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   42h
gp3 (default)    ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   43h
gp3-csi          ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   42h

However, reclaim is a cluster side controlled policy (and not a feature controlled by providers):

Pull down the closest storage class you want to use: oc get sc gp3-csi -o yaml > .sampleManifests/original-gp3-storage-class.yaml
Copy to a clean file. Remove uuid fields and change reclaimPolicy: Delete to reclaimPolicy: Retain
Apply the new manifest oc apply -f ./sampleMainfest/gp3-sc-with-retain.yaml

The result should be an additional storage class:

oc get sc
NAME             PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2              kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   43h
gp2-csi          ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   43h
gp3 (default)    ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   43h
gp3-csi          ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   43h
gp3-csi-retain   ebs.csi.aws.com         Retain          WaitForFirstConsumer   true                   159m

Approximately Once per model

This section is assumed to be run from your notebook server on the cluster with

Preload a pvc with your model

oc login to the cluster in your notebook e.g. !oc login --token ..
Create a pvc and apply the pvc (named: vllm-model-cache)
- oc apply -f ./sampleManifest/model-pvc.yaml
Create a pod which does nothing, and contains tar to enable the copy
- oc apply -f ./sampleManifest/model-storage-pod.yaml
Use oc cp to copy data up to the pvc
- oc cp granite-7b-lab model-store-pod-ubi:/pv/granite-7b-lab -c model-store
In the case above, granite-7b-lab is the relative path to the model in the notebook
Delete the pod but not the pvc
- oc delete -f ./sampleManifest/model-storage-pod.yaml

The model is now in a pvc ready to serve!

Deploying the customized model server

When deploying a single model server in ODH / Red Hat OpenShift AI two objects are created:

A servingRuntime
A servingInstance

The easiest way to get valid objects is to attempt to deploy a model using the UI first. This example deployed a vLLM runtime called granite to the llm namespace.

With this the objects are retrievalble

oc get servingruntimes.serving.kserve.io -n llm granite -o yaml > ./sampleManifests/granite-servingRuntime.yaml

oc get inferenceservices.serving.kserve.io -n llm granite -o yaml > ./sampleManifests/granite-instanceServices.yaml

Changes required

The servingRuntime purely needs name references changed. E.g. in this case find granite and replace with granite-pvc
- new file is here
The instanceServices need:
- A name change from granite to granite-pvc. The runtime: element MUST match the name of the servingRuntime
The storage needs to be changed. Note this path is hard coded with respect to what you upload and the pvc name

Original storage

  ...
      runtime: granite
      storage:
        key: aws-connection-minio
        path: granite-7b-lab
  ...

New storage

  ...
      runtime: granite-pvc
      storageUri: pvc://vllm-model-cache/granite-7b-lab
  ...

The full manifest is here

The PVC specified must be in the same namespace

Once all of this is complete you can apply the manifests to your cluster:

oc apply -f ./sampleManifests/new-granite-pvc-servingRuntime.yaml
oc apply -f ./sampleManifests/new-granite-pvc-instanceServices.yaml

Validation

Logging into the node and using du to inspect /var/lib/kubelet/pods should show that pods do not use local emptyDir storage.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
sampleManifests		sampleManifests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rhoai-kserve-from-pvc

Caveats

Ambitions

Process

Once per cluster

Approximately Once per model

Preload a pvc with your model

Deploying the customized model server

Changes required

Validation

About

Releases

Packages

License

butler54/rhoai-kserve-from-pvc

Folders and files

Latest commit

History

Repository files navigation

rhoai-kserve-from-pvc

Caveats

Ambitions

Process

Once per cluster

Approximately Once per model

Preload a pvc with your model

Deploying the customized model server

Changes required

Validation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages