Skip to content

Model Deployment

Logster's detection quality comes from a pair of pre-trained graph neural network models — one for Windows, one for Linux. This guide explains how the models are packaged, how they are loaded at inference-service startup, how to swap them, and how to verify integrity.

This is the Logster equivalent of a "Llama Guard Deployment" guide for a different model — it is the deployment path for the actual detection model that backs the pipeline.


Model artifacts

Logster ships two models in the default Compose stack:

Platform Host path Container path
Windows models/models/balanced_run_20260114_143653/best_model.pt /app/models/balanced_run_20260114_143653/best_model.pt
Linux models/models/balanced_run_20260222_142924/best_model.pt /app/models/balanced_run_20260222_142924/best_model.pt

Both are PyTorch .pt files containing the weights of a 3-layer Heterogeneous Graph Attention Network (GAT) with 128 hidden dimensions, 4 attention heads, and a 2-class softmax output (benign / attack).

The host directory models/models/ is mounted read-only into the inference container at /app/models. The container can never modify the weights.


How the models are loaded

On startup, the inference service:

  1. Reads model.path and model.linux_model_path from deploy/service-config.yaml.
  2. Calls torch.load() on each file to load the model state dict.
  3. Instantiates the HeteroGNNClassifier (Windows) and HeteroGNNClassifierLinux architectures and loads the weights into each.
  4. Moves each model to the device specified by model.device (cpu or cuda).
  5. Sets both models to evaluation mode.

If any of these steps fail, the container crashes at startup — never silently running on a broken model. Check docker compose logs inference for the exact traceback.


Swapping a model

Step 1 — Place the new model file

Copy the new best_model.pt into a new subdirectory under models/models/. Use a clear, dated directory name so you have a version history on disk:

cd models/models/
mkdir my_run_2026_04_12
cp /path/to/new_best_model.pt my_run_2026_04_12/best_model.pt

[!IMPORTANT] Do not overwrite existing model directories. Always add new versions side-by-side so that a rollback is a single config edit rather than a file restore.

Step 2 — Update the config

Edit deploy/service-config.yaml:

model:
  path: "/app/models/my_run_2026_04_12/best_model.pt"       # Windows
  linux_model_path: "/app/models/balanced_run_20260222_142924/best_model.pt"
  device: "cpu"

Remember the path is the container path. The mount root is /app/models, so the container sees models/models/my_run_2026_04_12/best_model.pt as /app/models/my_run_2026_04_12/best_model.pt.

Step 3 — Restart the inference service

cd deploy/
docker compose --profile services restart inference

Watch the startup logs to confirm the new model loads cleanly:

docker compose logs inference | tail -50

You should see a log line indicating the model path and a successful load.

Step 4 — Smoke-test the new model

Compare a few recent inferences against the previous model's behavior:

# How many inferences in the last 30 minutes?
curl 'http://localhost:9200/logster-inferences/_count' \
    -H 'Content-Type: application/json' \
    -d '{"query":{"range":{"@timestamp":{"gte":"now-30m"}}}}'

# What's the prediction distribution?
curl 'http://localhost:5001/api/distribution'

A healthy swap looks like:

  • inferences_run metric resumes climbing at the same rate.
  • Prediction distribution is similar to pre-swap (big benign majority, small attack tail, low error rate).
  • inference_time_ms is comparable to pre-swap.

If any of these look very different from the previous model, you may have deployed a model with different expectations about input shape or distribution. Roll back by reverting model.path to the previous directory and restarting.


Rolling back

Because you kept the previous model directory under models/models/, rollback is a one-line config edit and a restart:

model:
  path: "/app/models/balanced_run_20260114_143653/best_model.pt"  # previous version
docker compose --profile services restart inference

Integrity verification

Model files are executable code from PyTorch's perspective — torch.load() will run arbitrary pickled code at load time. This means tampering with a model file is equivalent to code injection into the inference service.

Record a checksum on deploy

Every time you deploy a new model, record its SHA-256:

sha256sum models/models/my_run_2026_04_12/best_model.pt

Store the output alongside your deployment notes (in your change management system, config management repo, or however your team tracks infrastructure changes).

Verify on restart

Before each restart of the inference service, compare the current checksum against the recorded value:

# Expected
echo "abc123...  models/models/my_run_2026_04_12/best_model.pt" > expected.sum

# Verify
sha256sum -c expected.sum

If the checksum does not match, do not start the service. Investigate the mismatch first.

[!WARNING] If torch.load is loading a tampered file, the container will still start successfully and will happily produce attacker- controlled predictions. There is no automatic integrity check at runtime — you must verify checksums manually, or wire the check into your deployment pipeline.

Use signed containers

For the strongest supply chain posture, distribute Logster container images signed with cosign or Notary v2, and gate deployment on signature verification. This defends against a compromised container registry in addition to a compromised model file.


CPU vs GPU deployment

CPU

Set model.device: cpu in deploy/service-config.yaml. No container changes required. Works on any Docker host.

Right for: small deployments (≤ 100 endpoints), development environments, cost-sensitive small teams.

GPU

Set model.device: cuda. The inference container needs the NVIDIA container runtime to access a GPU on the host:

inference:
    runtime: nvidia
    environment:
        NVIDIA_VISIBLE_DEVICES: "0"

The host must have:

  • An NVIDIA driver matching the PyTorch build's CUDA version.
  • The nvidia-container-toolkit package installed.

Right for: production deployments with hundreds to thousands of endpoints. The Logster SaaS stack runs on NVIDIA RTX 4090 GPUs — see Licensing Guide: Hardware.

Sizing

TBD — real benchmarks required.

The inference service's per-replica throughput depends on:

  • Hardware (CPU core count / GPU model)
  • inference.window and inference.interval
  • Average events per endpoint per window
  • Graph size (nodes and edges) per window

Populate this section with measured numbers from your own deployment, or from Logster Support's published benchmarks once available. Do not guess — undersized hardware will silently degrade detection quality.

Metrics to capture when benchmarking:

  • inferences_run per second (Prometheus)
  • inference_time_ms p50 / p95 / p99 (Prometheus)
  • active_endpoints at which inference_time_ms starts rising

Where to go next