GPU Node — LLM Inference Server

The GPU Node runs the local LLM that Logster consults to reach a verdict on each window of endpoint activity. It is shipped as a Docker image tarball that you load and run on a VM with GPU access.

Provisioning the VM with working GPU access is your responsibility — this page guides you through the prerequisites and then walks you through loading and running the image.

[!NOTE] Bring the GPU Node up before the App Node. You will need this node's endpoint URL when you configure the App Node.

Hardware

Resource	Recommended	Minimum
CPU	8 vCPU	8 vCPU
RAM	64 GB	64 GB
GPU	2 × NVIDIA H100 80 GB	2 × NVIDIA RTX A6000

Step 1 — Provision the VM with GPU access

The model runs inside Docker and needs direct access to the GPU. Provision a Linux VM (Ubuntu Server 22.04 is a good default) with the GPU(s) passed through, then install the prerequisites so Docker can use them:

GPU passthrough — configure PCIe passthrough for the GPU(s) in your hypervisor so the VM sees the physical cards. This is hypervisor-specific (Proxmox, ESXi, KVM, Hyper-V, etc.) — follow your platform's documentation.
NVIDIA driver — install the NVIDIA driver inside the VM and confirm the GPUs are visible:
```
nvidia-smi
```
You should see a table listing your GPU(s). If this command fails, stop here and fix the driver/passthrough before continuing.
Docker Engine + Docker Compose — install Docker following the official guide.
NVIDIA Container Toolkit — this is what lets Docker containers reach the GPU. Install it following the NVIDIA Container Toolkit guide, then verify a container can see the GPU:
```
sudo docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
```
If this prints the same GPU table from inside a container, the GPU Node host is ready.

Step 2 — Load the model image

You have been shipped the model as a single Docker image tarball. The model weights and every serving parameter are baked into the image — there is nothing to download and no configuration to supply at runtime.

Load the tarball into the VM's local Docker image store:

sudo docker load -i /path/to/logster-26b-a4b-it-v1.0.2.tar

When the load finishes, Docker prints the image name and tag. Confirm it is present:

sudo docker images | grep logster-26b-a4b-it

You should see eunomatix/logster-26b-a4b-it:v1.0.2.

Step 3 — Run the model server

Everything the server needs is baked into the image, so you do not pass any application arguments — just run the image. It exposes an OpenAI-compatible API on port 8000:

sudo docker run -d --name logster-classifier --restart unless-stopped \
  --gpus '"device=0"' \
  --ipc=host \
  --shm-size 16g \
  -p 8000:8000 \
  eunomatix/logster-26b-a4b-it:v1.0.2

Notes:

The flags above are host/runtime concerns Docker: GPU access (--gpus), shared memory (--ipc=host, --shm-size), and the published port (-p 8000:8000).
--gpus '"device=0"' selects the first GPU. Adjust the device selection to match the GPU(s) you want the server to use.
The container is set to --restart unless-stopped, so it comes back automatically after a reboot.

Because the weights are already in the image, there is no download step. Startup still takes a short while as the model loads onto the GPU — follow the logs until the server reports it is ready to serve:

sudo docker logs -f logster-classifer

Step 4 — Verify the endpoint

From the GPU Node itself, confirm the server is answering:

curl http://localhost:8000/v1/models

You should get a JSON response listing the loaded model.

Step 5 — Note the endpoint URL for the App Node

The App Node talks to this server through its Chat Completions URL. Using the GPU Node's network address (the address the App Node can route to):

http://<gpu-node>:8000/v1/chat/completions

You will set this as LOCAL_LLM_ENDPOINT on the App Node — see App Node → Step 2.

[!IMPORTANT] Make sure the App Node can reach the GPU Node on port 8000. If a firewall sits between the two nodes, open that port.