Your Proxmox server is probably sitting idle most of the day. It's got RAM you're not using, CPU headroom to spare, and if you have a dGPU in it, a graphics card that's doing absolutely nothing. Running Ollama inside an LXC container is one of the highest-value things you can do with that spare capacity — a private, always-on LLM that responds in under a second, never leaks your prompts to a third party, and costs $0/month beyond existing electricity.
This guide walks through the full setup: LXC creation, Ollama install, GPU passthrough if you have it, Open WebUI for a ChatGPT-style interface, and how to wire it into Home Assistant for voice-activated local AI.
Why LXC Instead of a VM
LXC shares the host kernel for near-native GPU performance and near-zero overhead vs a full VM.
You could run Ollama in a full VM, but LXC containers on Proxmox share the host kernel. That gives you:
- Lower overhead — no hypervisor CPU/RAM tax per container
- Near-native GPU performance — device passthrough is direct, not emulated
- Faster startup — an LXC container starts in under a second vs 15–30s for a VM
The tradeoff: LXC containers require nesting=1 for Docker (if you want it), and root in an LXC container has more host access than in a VM. For a home network, this is fine. For production multi-tenant setups, use a VM.
Prerequisites
You need Proxmox 8.x, an Ubuntu LXC template, and optionally an NVIDIA or AMD GPU.
- Proxmox 8.x (7.x works with minor differences)
- An LXC-compatible base template (Ubuntu 22.04 or 24.04 recommended)
- Optional: NVIDIA or AMD GPU in your Proxmox host for hardware acceleration
Step 1: Create the LXC Container
Create a container with 4+ cores, 16GB RAM, and 40GB disk — models are large and need room.
In the Proxmox web UI:
-
Datacenter → Storage — make sure you have a CT template downloaded. If not, go to your storage → CT Templates → Templates and download
ubuntu-24.04-standard. -
Create CT with these settings:
- Hostname:
ollama(or whatever) - Template: ubuntu-24.04-standard
- Disk: 40GB minimum (models are large — Llama 3.1 70B is ~40GB alone)
- CPU: 4+ cores
- RAM: 8GB minimum, 16GB recommended
- Network: Bridge to your main VLAN
- Hostname:
-
In Options → Features, enable:
nesting=1(required if you want Docker inside the container)keyctl=1(needed for some container workloads)
Or via CLI on the Proxmox host:
pct create 110 local:vztmpl/ubuntu-24.04-standard_24.04-2_amd64.tar.zst \
--hostname ollama \
--cores 4 \
--memory 16384 \
--rootfs local-lvm:40 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp \
--features nesting=1,keyctl=1 \
--unprivileged 1 \
--start 1
Step 2: Install Ollama
One curl command installs Ollama as a systemd service — it's running in under 60 seconds.
Start the container and SSH in (or use the Proxmox shell):
pct exec 110 -- bash
Once inside:
# Update and install curl
apt update && apt install -y curl
# Install Ollama (official installer)
curl -fsSL https://ollama.com/install.sh | sh
# Verify it's running
ollama --version
systemctl status ollama
Ollama installs as a systemd service and starts automatically. The service listens on 127.0.0.1:11434 by default.
Step 3: Pull Your First Model
ollama pull llama3.2 gets you a capable model in minutes; test it immediately with ollama run.
# Small, fast — great for testing and Home Assistant
ollama pull llama3.2
# Better quality, needs 8GB RAM
ollama pull llama3.1:8b
# Coding focused
ollama pull qwen2.5-coder:7b
# Embed vector search (for RAG later)
ollama pull nomic-embed-text
Test it immediately:
ollama run llama3.2 "What's the capital of France? Answer in one word."
You should get a response in 1–3 seconds on a modern CPU. With a GPU, it's nearly instant.
Step 4: Make Ollama Accessible on Your Network
Set OLLAMA_HOST=0.0.0.0 via a systemd override to expose Ollama to your local network.
By default, Ollama only binds to 127.0.0.1. To access it from other machines (Open WebUI, Home Assistant, your laptop), you need to change the bind address.
# Edit the systemd service override
mkdir -p /etc/systemd/system/ollama.service.d
cat > /etc/systemd/system/ollama.service.d/override.conf << 'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
EOF
systemctl daemon-reload
systemctl restart ollama
Now Ollama is accessible at http://<container-ip>:11434 from any machine on your network.
Test from another machine:
curl http://192.168.7.XXX:11434/api/generate \
-d '{"model":"llama3.2","prompt":"Hello!","stream":false}'
Step 5: Add Open WebUI (Optional but Recommended)
One Docker command gives you a full ChatGPT-style UI with multi-user and conversation history.
Open WebUI gives you a full ChatGPT-style interface connected to your local Ollama. It also handles multi-user sessions, model management, and conversation history.
Install with Docker (requires nesting=1 on the LXC):
# Install Docker inside the container
curl -fsSL https://get.docker.com | sh
# Run Open WebUI connected to Ollama
docker run -d \
--name open-webui \
--restart always \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://localhost:11434 \
-v open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
Open WebUI will be at http://<container-ip>:3000. First visit creates an admin account.
If you want Open WebUI and Ollama on separate ports and prefer not to use Docker inside LXC, you can run Open WebUI as a separate LXC and point it at the Ollama container's IP.
Step 6: NVIDIA GPU Passthrough (If You Have a GPU)
A mid-range gaming GPU gives 5–10x inference speedup; requires a privileged LXC and device passthrough.
This is where it gets significantly faster. LLM inference is memory-bandwidth bound — a mid-range gaming GPU from 5 years ago will outrun a modern CPU by 5–10x on most models.
Note: GPU passthrough to LXC containers requires a privileged container. Redo the container setup with
--unprivileged 0if you want GPU access.
On the Proxmox host, identify your GPU:
lspci | grep -i nvidia
# Example: 01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060]
Add device passthrough to your LXC config (/etc/pve/lxc/110.conf):
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
Inside the container, install the NVIDIA driver (must match the host version):
# Check the host driver version first
ssh root@proxmox-host "nvidia-smi | grep 'Driver Version'"
# Install matching version inside container (example: 535.x)
apt install -y nvidia-driver-535 nvidia-utils-535
# Verify GPU is visible
nvidia-smi
Restart Ollama and it will automatically use the GPU:
systemctl restart ollama
ollama run llama3.1:8b "What's 2+2?"
# Should show GPU utilization in nvidia-smi
AMD GPU Passthrough
AMD is actually easier on Linux — use ROCm:
Inside the container, install ROCm:
apt install -y rocm-hip-libraries
Add device passthrough to your LXC config (/etc/pve/lxc/110.conf):
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
Then install Ollama (it auto-detects ROCm):
curl -fsSL https://ollama.com/install.sh | sh
Step 7: Wire Into Home Assistant
Point HA's Ollama integration at your container IP for a fully local, private voice AI assistant.
This is where it gets genuinely useful for a smart home. You can use Ollama as the AI backend for Home Assistant's conversation agent — completely local, no monthly subscription, no API keys.
Via the Ollama Integration (HA 2024.3+)
In Home Assistant:
- Settings → Integrations → Add Integration → Ollama
- Enter your Ollama URL:
http://192.168.7.XXX:11434 - Select your preferred model (llama3.2 works well for HA)
- Set as your default conversation agent
Now your voice assistant uses local AI. "Hey Google, talk to Home Assistant" → your Proxmox server handles the NLU, nothing leaves your network.
For Automation — Using the REST API
# In an automation action
action: rest_command.ask_ollama
# configuration.yaml
rest_command:
ask_ollama:
url: "http://192.168.7.XXX:11434/api/generate"
method: POST
content_type: "application/json"
payload: '{"model":"llama3.2","prompt":"","stream":false}'
You can now build automations that use an LLM to interpret sensor data, draft notifications, or make context-aware decisions — no cloud required.
Model Selection Guide
Use llama3.2 for general/HA, qwen2.5-coder for code, and nomic-embed-text for RAG pipelines.
Different tasks need different models. Here's a quick reference for home/lab use:
| Use Case | Recommended Model | Size | Notes |
|---|---|---|---|
| General chat / HA assistant | llama3.2 |
2.0GB | Fast, good reasoning |
| Home Assistant (low RAM) | llama3.2:1b |
1.3GB | Tiny, still useful |
| Code generation | qwen2.5-coder:7b |
4.7GB | Best free coding model |
| Long context (big docs) | llama3.1:8b |
4.7GB | 128k context window |
| Privacy-first chat | mistral:7b |
4.1GB | Clean Apache license |
| Vector embeddings | nomic-embed-text |
274MB | For RAG pipelines |
| Vision tasks | llava:7b |
4.5GB | Describe images locally |
Disk tip: models are stored in /root/.ollama/models/ by default. If your LXC root is on limited-size storage, symlink or bind-mount to a larger volume:
# If /mnt/nas is mounted inside the container
mkdir -p /mnt/nas/ollama-models
ln -s /mnt/nas/ollama-models /root/.ollama/models
Keeping Ollama Updated
Re-run the install script to update in-place — it handles everything without losing your models.
Ollama updates frequently. The Proxmox community script handles this automatically if you used it to create the container, but for manual setups:
# Re-run the installer — it updates in-place
curl -fsSL https://ollama.com/install.sh | sh
systemctl restart ollama
Or pin to a specific version by downloading the binary directly from github.com/ollama/ollama/releases.
What's Next
Open WebUI, AnythingLLM for RAG, and Whisper+Piper for a fully local voice assistant are the natural next steps.
Once Ollama is running, the usual next steps are:
- Open WebUI — Full ChatGPT-style UI with multi-user support
- AnythingLLM — RAG pipeline for chatting with your documents
- LocalAI — Drop-in OpenAI-compatible API, useful for apps that hardcode OpenAI endpoints
- Home Assistant AI Voice — Combine Whisper (local STT) + Ollama (LLM) + Piper (local TTS) for a fully local voice assistant
The Ollama ecosystem is expanding fast. A Proxmox LXC is the right foundation: isolated, low-overhead, easy to snapshot before experiments, and trivial to clone if you want to test multiple model setups.
Questions or improvements? The source for this post is on GitHub. PRs welcome.