Appliance Troubleshooting Script
The OVA ships with a self-diagnostic script that checks the two most common "nothing works" situations on the appliance:
- Are Sysmon events flowing into Kafka? An empty dashboard almost always traces back to here.
- Is the local LLM (the GPU Node) reachable and answering?
Verdicts that all show
error, or nothing being scored, almost always trace back to here.
Run it whenever the appliance is up but data isn't appearing where you expect.
Running it
The script is installed on the App Node at
/opt/logster/troubleshoot/logster-troubleshoot.sh. Run it with sudo:
It prints a timestamped PASS/FAIL line for each check, followed by a
summary. It exits non-zero if any check fails, so it is also safe to call
from other scripts.
[!NOTE] Start the stack first (App Node → Step 4). The script talks to the running containers; if the stack is down it will report the failure and tell you how to bring it up.
What it checks
Check 1 — Sysmon events flowing into Kafka
The script live-tails the sysmon-logs topic for up to 60 seconds, waiting for
a new event to arrive.
- PASS — events are flowing. The endpoint side of the pipeline is healthy.
- FAIL — no event was seen within the window. The Windows endpoint agent is
likely not shipping, or the LAN listener / Winlogbeat output is misconfigured.
See
App Node → Point your endpoints at the appliance
and confirm the endpoint can reach
<EXTERNAL_KAFKA_LAN_HOST>:29092.
The script also prints the manual command it uses, so you can keep watching the topic yourself:
cd /opt/logster/deploy && sudo docker compose exec kafka \
kafka-console-consumer --bootstrap-server localhost:9092 \
--topic sysmon-logs
If nothing prints for about a minute, no events are flowing in.
Check 2 — Local LLM reachability
The script discovers the LLM endpoint the inference service is actually using
(it reads LOCAL_LLM_ENDPOINT from the running inference container, falling
back to /etc/logster/logster.env), then:
- Calls
GET /modelson the GPU node to confirm it is reachable and to read the served model id. -
Sends a tiny dummy chat completion to confirm the model actually answers.
-
PASS — the GPU node is reachable and the model answered (HTTP 200). The inference path can reach the model.
- FAIL — the message tells you which stage failed and the likely cause:
| Symptom reported | Likely cause |
|---|---|
LOCAL_LLM_ENDPOINT is not set |
The value is missing from logster.env — see App Node → Step 2. |
| Cannot resolve the LLM host (DNS) | Wrong hostname in LOCAL_LLM_ENDPOINT. |
| Connection refused | The GPU node is down or the port is wrong. |
| Connection timed out | Firewall, wrong host, or the node is unreachable. |
| HTTP 400 | Model name mismatch. |
| HTTP 401 | Bad or missing API key. |
| HTTP 5xx | Error on the GPU node itself. |
[!IMPORTANT] If the local LLM is unreachable the stack still runs, but every window is reported as an
errorverdict (notbenign). Make sure the GPU Node is up and reachable from the App Node before starting — see GPU Node.
Advanced — overriding the defaults
The script targets the appliance layout by default. For a non-standard layout you can override these
environment variables:
| Variable | Default | Purpose |
|---|---|---|
LOGSTER_DEPLOY_DIR |
/opt/logster/deploy |
Directory holding docker-compose.yml. |
LOGSTER_ENV_FILE |
/etc/logster/logster.env |
Env file the LLM endpoint/key are read from when the container isn't running. |
If docker-compose.yml is not found at LOGSTER_DEPLOY_DIR, the script exits
with status 2 and prints which directory it expected.