Ollama

Ollama enables you to easily run large language models (LLMs) locally. It supports Llama 3, Mistral, Gemma and many others.

❄️You can now perform LLM inference with Ollama in services-flake!https://t.co/rtHIYdnPfb pic.twitter.com/1hBqMyViEm
— NixOS Asia (@nixos_asia) June 12, 2024

Getting Started

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1".enable = true;
}

Acceleration

By default Ollama uses the CPU for inference. To enable GPU acceleration:

Note

NixOS provides documentation for configuring both Nvidia and AMD GPUs drivers. However, if you are using any other distribution, refer to their respective documentation.

CUDA

For NVIDIA GPUs.

Firstly, allow unfree packages:

# Inside perSystem = { system, ... }: { ...
{
  imports = [
    "${inputs.nixpkgs}/nixos/modules/misc/nixpkgs.nix"
  ];
  nixpkgs = {
    hostPlatform = system;
    # Required for CUDA
    config.allowUnfree = true;
  };
}

And then enable CUDA acceleration:

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "cuda";
  };
}

ROCm

For Radeon GPUs.

# In `perSystem.process-compose.<name>`
{
  services.ollama."ollama1" = {
    enable = true;
    acceleration = "rocm";
  };
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama.md

ollama.md

Ollama

Getting Started

Acceleration

CUDA

ROCm

Files

ollama.md

Latest commit

History

ollama.md

File metadata and controls

Ollama

Getting Started

Acceleration

CUDA

ROCm