Ollama enables you to easily run large language models (LLMs) locally. It supports Llama 3, Mistral, Gemma and many others.
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>❄️You can now perform LLM inference with Ollama in services-flake!https://t.co/rtHIYdnPfb pic.twitter.com/1hBqMyViEm
— NixOS Asia (@nixos_asia) June 12, 2024
# In `perSystem.process-compose.<name>`
{
services.ollama."ollama1".enable = true;
}
By default Ollama uses the CPU for inference. To enable GPU acceleration:
Note
NixOS provides documentation for configuring both Nvidia and AMD GPUs drivers. However, if you are using any other distribution, refer to their respective documentation.
For NVIDIA GPUs.
Firstly, allow unfree packages:
# Inside perSystem = { system, ... }: { ...
{
imports = [
"${inputs.nixpkgs}/nixos/modules/misc/nixpkgs.nix"
];
nixpkgs = {
hostPlatform = system;
# Required for CUDA
config.allowUnfree = true;
};
}
And then enable CUDA acceleration:
# In `perSystem.process-compose.<name>`
{
services.ollama."ollama1" = {
enable = true;
acceleration = "cuda";
};
}
For Radeon GPUs.
# In `perSystem.process-compose.<name>`
{
services.ollama."ollama1" = {
enable = true;
acceleration = "rocm";
};
}