Can RamaLama support Kubernetes-based inference clustering on a Mac Mini M4? #742
Replies: 7 comments
-
It can be done, you need to install podman-machine with krunkit that's the first step https://podman.io/ . There's also kubelet generators via:
Most of the bits are there, just need someone to tie it all together. |
Beta Was this translation helpful? Give feedback.
-
Would make a great blog post! |
Beta Was this translation helpful? Give feedback.
-
ramalama serve --generate kube MODEL Will generate a k8s deployment for the containerized AI Model. |
Beta Was this translation helpful? Give feedback.
-
Can Docker Desktop serve as a replacement for Podman in this setup? |
Beta Was this translation helpful? Give feedback.
-
docker desktop should be relatively compatible, not that I've tried it personally. I recommend trying Docker VMM beta if you want accelerated GPU support on Docker Desktop (Podman Desktop is more tested with RamaLama though). Docker VMM beta uses krunkit https://github.com/containers/krunkit |
Beta Was this translation helpful? Give feedback.
-
I am going to move this to a discussion. |
Beta Was this translation helpful? Give feedback.
-
maybe podman is a better choice |
Beta Was this translation helpful? Give feedback.
-
I’m really impressed with the work being done on RamaLama and its ability to handle various models for inference. I have been exploring the possibility of leveraging Kubernetes (K8s) for distributed model in my local environment, specifically on a Mac Mini.
Is there currently support within RamaLama to deploy and manage model inference workloads using Kubernetes on a Mac Mini?
If not natively supported, are there any recommendations or best practices for integrating RamaLama with Kubernetes for local development/testing purposes?
Are there any known limitations or considerations when running Kubernetes-based inference clusters on ARM-based hardware (like the M4 chip) using RamaLama?
Beta Was this translation helpful? Give feedback.
All reactions