Enterprise Architecture
This page explains the end-to-end architecture of a Logster deployment — every component, how they communicate, and where data lives at each stage. It is written for platform engineers, security architects, and anyone evaluating Logster for a production rollout.
For the exact schemas, GNN layer details, and Kafka partition counts, see the contributor-facing LOGSTER_ARCHITECTURE.md.
High-level view
Logster's pipeline is a chain of Kafka-coupled microservices. No service talks directly to another — every stage communicates through Kafka topics, which makes every component independently scalable and failure-tolerant.
flowchart TB
subgraph Endpoints["MONITORED ENDPOINTS"]
WIN["Windows<br/>Winlogbeat / Sysmon"]
LNX["Linux<br/>auditd + eBPF"]
end
subgraph Kafka["KAFKA (KRaft mode)"]
direction TB
subgraph RawTopics["Raw topics"]
T1[sysmon-logs]
T2[linux-auditd-logs]
T3[linux-ebpf-process-logs]
T4[linux-ebpf-file-logs]
T5[linux-ebpf-network-logs]
end
subgraph ProcTopics["Processed topics"]
P1[normalized-endpoint-events]
P2[logster-inference-results]
P3[logster-alerts]
end
end
NORM[NORMALIZER]
INF[INFERENCE]
ALR[ALERTS]
REDIS[(Redis<br/>window state)]
API[/"REST API<br/>:8080"/]
subgraph Serving["LOGSTASH → ELASTICSEARCH → DASHBOARD"]
direction TB
LS[Logstash]
ES[("Elasticsearch<br/>logster-events<br/>logster-inferences")]
DASH["Dashboard<br/>React + Express :5001"]
KIBANA["Kibana :5601"]
GRAF["Grafana :3000"]
PROM["Prometheus :9090"]
end
WIN --> T1
LNX --> T2 & T3 & T4 & T5
RawTopics --> NORM
NORM --> P1
P1 --> INF
INF --> P2
P2 --> ALR
ALR --> P3
INF <--> REDIS
ALR --> API
P1 --> LS
P2 --> LS
LS --> ES
ES --> DASH
ES --> KIBANA
classDef topic fill:#2b3a55,stroke:#8fa8d8,color:#e6ecff;
classDef svc fill:#2f5148,stroke:#7bc9a8,color:#e6fff3;
classDef store fill:#55432b,stroke:#d8b98f,color:#fff3e6;
class T1,T2,T3,T4,T5,P1,P2,P3 topic;
class NORM,INF,ALR,LS,DASH,KIBANA,GRAF,PROM,API svc;
class REDIS,ES store;
The Docker Compose file that wires this together for local development lives at deploy/docker-compose.yml.
Pipeline services
Each service is a thin Python package (or, for the dashboard, a Node/React project). They share two libraries under libs/ and communicate exclusively via Kafka and Elasticsearch.
Normalizer
Location: services/normalizer/
- Consumes: the five raw topics (
sysmon-logs,linux-auditd-logs,linux-ebpf-{process,file,network}-logs). - Produces:
normalized-endpoint-events. - State: none — fully stateless and horizontally scalable.
The normalizer routes each raw event to a platform-specific parser (Sysmon,
auditd, or eBPF), transforms it into the unified NormalizedEvent schema,
and publishes it with a Kafka key of tenant_id:endpoint_id so all events
for a given host land on the same partition. Downstream services never have
to know whether an event originated from Sysmon, auditd, or eBPF — the
normalizer is the one place in the system that cares.
Inference
Location: services/inference/
- Consumes:
normalized-endpoint-events. - Produces:
logster-inference-results. - State: per-endpoint sliding windows, optionally persisted in Redis.
The inference service buffers normalized events into per-endpoint windows
keyed by (tenant_id, endpoint_id, platform). Every inference.interval
seconds (default 30s), for every endpoint with activity, it:
- Collects events from the last
inference.windowminutes (default 3 minutes). - On Linux, runs a cheap idle-daemon pre-filter — if every process in the
window is on an allow-list (
systemd,cron,sshd, ...), the window is labeledbenignwithout invoking the model. - Otherwise, builds a heterogeneous temporal graph (NetworkX → PyTorch Geometric) with platform-specific node and edge types.
- Runs the platform's pre-trained GNN forward pass (Windows or Linux model).
- Publishes an
InferenceResultwithprediction,attack_prob,confidence, and graph statistics.
The model architecture is a 3-layer Heterogeneous Graph Attention Network (GAT) with 128 hidden dimensions, 4 attention heads, and a 2-class softmax output (benign / attack).
Alerts
Location: services/alerts/
- Consumes:
logster-inference-results. - Produces:
logster-alerts, plus an in-memory or Postgres-backed alert store. - State: dedup windows, per-tenant correlation windows.
The alerts service turns raw attack predictions into actionable alerts. Four things happen on every inference result:
- Threshold filter. Results below
alerts.min_threshold(default0.7) are dropped immediately. - Deduplication. If the same
(tenant, endpoint, platform)has fired withinalerts.dedup_window_seconds(default 300s), the new result is merged into the existing open alert instead of opening a new one. - Lateral-movement correlation. If multiple endpoints in the same
tenant alert within
alerts.correlation_window_seconds(default 60s), they are cross-linked via therelated_endpointsfield. - Severity and enrichment. Severity is bucketed from
attack_prob(>=0.95CRITICAL,>=0.85HIGH,>=0.7MEDIUM). If a TTP analyzer is configured, an async call enriches the alert with MITRE ATT&CK technique IDs.
API
Location: services/api/
- Port:
8080 - Framework: FastAPI
The API exposes /alerts, /alerts/{id}, /feedback, /endpoints, and
/health. Analysts and integrations call it to list alerts, transition
alert states, and record true/false-positive verdicts. See the
API User Guide for full endpoint documentation.
[!NOTE] The default build wires the API to an in-memory alert store, which means alerts are lost on API restart. For production use, swap in the Postgres-backed store — see Admin Guide: Important Considerations.
Dashboard
Location: services/dashboard/
- Port:
5001(published from container port5000) - Stack: React SPA + Express backend
The dashboard reads directly from Elasticsearch (not via the REST API) and serves aggregated views for SOC analysts: host summaries, attack timelines, inference detail, process trees, endpoint insights, and MITRE TTP distributions. See the Dashboard User Guide for the analyst walkthrough.
Message topology — Kafka
All inter-service communication flows through Kafka. The topic layout:
| Topic | Partitions | Producer | Consumers |
|---|---|---|---|
sysmon-logs |
6 | Winlogbeat on Windows endpoints | normalizer |
linux-auditd-logs |
6 | auditd shipper | normalizer |
linux-ebpf-process-logs |
6 | eBPF collector | normalizer |
linux-ebpf-file-logs |
6 | eBPF collector | normalizer |
linux-ebpf-network-logs |
6 | eBPF collector | normalizer |
normalized-endpoint-events |
12 | normalizer | inference, logstash |
logster-inference-results |
6 | inference | alerts, logstash |
logster-alerts |
3 | alerts | (reserved for downstream sinks) |
Every event is keyed by tenant_id:endpoint_id, which guarantees
per-endpoint message ordering through the entire pipeline and enables
partition-aware scaling: add more inference replicas, and Kafka
automatically rebalances partitions across them without breaking ordering.
[!IMPORTANT] Do not increase partition counts on a running topic once real traffic is flowing. Kafka allows it, but existing endpoint keys will rehash to different partitions and break per-endpoint ordering during the transition. Plan partition counts up front.
Data stores
| Store | Purpose | Volatility |
|---|---|---|
| Kafka | Event transport and durable buffer for every topic. | Persistent (configurable retention). |
| Redis | Per-endpoint inference state (sliding windows). | Volatile — losing it means the first inference.window of events after recovery is cold. |
| Elasticsearch | Long-term query store for logster-events and logster-inferences. Powers the dashboard. |
Persistent (configure ILM for retention). |
| Alert store | Backing store for the REST API. In-memory by default; Postgres optional. | In-memory loses alerts on restart; Postgres is durable. |
Logstash is the bridge between Kafka and Elasticsearch. Two pipelines
defined under deploy/logstash/ ship
normalized-endpoint-events to the logster-events index and
logster-inference-results to the logster-inferences index, using the
event / inference id as the ES document id.
Infrastructure components
| Component | Image | Default port | Purpose |
|---|---|---|---|
| Kafka | confluentinc/cp-kafka:7.7.1 |
9092 internal, 29092 external | Event streaming (KRaft mode, no Zookeeper) |
| Elasticsearch | elasticsearch:8.13.0 |
9200 | Long-term event + inference storage |
| Redis | redis:7-alpine |
6379 | Per-endpoint sliding-window state |
| Logstash | logstash:8.13.0 |
— | Kafka → Elasticsearch ingestion |
| Kibana | kibana:8.13.0 |
5601 | Raw log exploration |
| Prometheus | prom/prometheus:v2.51.0 |
9090 | Metrics scraping |
| Grafana | grafana/grafana:10.4.0 |
3000 | Metrics visualization (admin / logster) |
| Tempo | grafana/tempo:2.4.1 |
3200 / 4317 | Trace backend |
| OpenTelemetry Collector | otel/opentelemetry-collector-contrib:0.96.0 |
4317 / 4318 | Trace ingestion for every service |
End-to-end example
Walk through a single PowerShell attack on a Windows host to see every component in action:
- Endpoint. Winlogbeat on
DESKTOP-1NNIMRRcaptures Sysmon Event ID 1 (Process Create) forpowershell.exe -enc <base64>and publishes the raw JSON to thesysmon-logsKafka topic. - Normalizer. Consumes the event, extracts
image,command_line,parent_image,user, andhashesinto aNormalizedEvent, and publishes tonormalized-endpoint-eventswith keydefault:desktop-1nnimrr. - Logstash. Tails
normalized-endpoint-eventsand indexes the document intologster-events(document id =event_id). - Inference. Appends the event to the
desktop-1nnimrrsliding window. At the next 30-second tick: - Collects the last 3 minutes of events for that endpoint.
- Builds a heterogeneous graph (process → file, process → network, ...).
- Runs the Windows GNN →
attack_prob=0.91, prediction=attack. - Publishes an
InferenceResulttologster-inference-results. - Logstash (again). Indexes the inference result into
logster-inferences. - Alerts. Consumes the attack prediction:
- Creates a new alert with severity
HIGH(since0.91 >= 0.85). - Checks whether any other endpoints in
defaultalerted in the last 60 seconds. - Calls the TTP analyzer (if configured) →
T1059.001(PowerShell). - Writes to its store and publishes to
logster-alerts. - Dashboard. Sees the new inference record in
logster-inferencesand surfaces it on the host card, attack timeline, inference detail, process tree, and TTP distribution views. - Analyst. Opens the alert, confirms it as a true positive via
POST /feedback, and the alert transitions toresolved.
Design rationale
Three design decisions shape almost everything downstream:
Kafka at the center. Every service is a Kafka consumer/producer, never a direct dependency on another service. Inference can crash and restart, and events pile up harmlessly in its input topic. Logstash and the dashboard are fed from the same topics the services consume, so the analyst view and the ES indices never disagree about what the pipeline saw.
Per-endpoint windows, not per-event inference. The GNN needs
behavioral context — a single powershell.exe invocation is not
interesting on its own. Running the model on 3-minute sliding windows
gives it enough events to build a graph with real structure. This also
makes inference.window and inference.interval the main tuning knobs
for the latency-vs-accuracy trade-off.
Dedup first, correlate second. Without dedup, a noisy endpoint would spam analysts with identical alerts every 30 seconds. Without correlation, cross-host lateral movement would look like unrelated incidents. The alerts service handles both before anything reaches the API or the dashboard.
Where to go next
- Admin Guide: Installation — stand up a local Logster stack with Docker Compose.
- Admin Guide: Installation Parameters — tune every pipeline parameter.
- Key Features — feature-by-feature summary of what Logster provides.