Skip to content

Security Guide: Security Validations

This page documents how Logster validates data as it flows through the pipeline — at ingestion, during normalization, during inference, and at the API layer. It is written so that security engineers can reason about attack surface: "what happens if an attacker injects malformed JSON into Kafka?", "what happens if the model produces a nonsense prediction?", "what happens if the API receives a crafted request?"


Stage 1 — Raw Kafka ingestion

Trust boundary: the five raw topics (sysmon-logs, linux-auditd-logs, linux-ebpf-process-logs, linux-ebpf-file-logs, linux-ebpf-network-logs). Everything published to these topics is treated as untrusted input and must be validated by the normalizer before being re-published.

What is validated

The normalizer accepts any JSON payload on these topics, but parses it through platform-specific parser functions:

  • Windows events go through normalize_sysmon_event().
  • Linux auditd events go through normalize_auditd_event().
  • Linux eBPF events go through normalize_ebpf_event().

Each parser:

  1. Checks that the event type is known. Unknown event types are dropped with a parse_errors counter increment.
  2. Extracts only the fields it understands into the typed data field of the NormalizedEvent. Unrecognized fields are preserved under raw_event for forensic replay but are not routed into downstream logic.
  3. Normalizes identifiers. Hostnames are lowercased (DESKTOP-Xdesktop-x), timestamps are converted to Unix epoch float, and tenant_id defaults to "default" if not present.
  4. Assigns a fresh UUID as event_id. This is used by Logstash as the Elasticsearch document id, so duplicates are implicitly deduped at the ES layer.

What is not validated

  • Kafka write access. In PLAINTEXT mode (the default), anyone who can reach the broker can publish to any topic. This is the single most important thing to lock down before production. See Configuration: Kafka authentication.
  • Payload schema strictness. The parser is forgiving — fields that don't match expected types are often silently coerced or skipped rather than raising an error. This is an intentional trade-off against the realities of real-world Sysmon/auditd payloads (which include unexpected fields routinely), not a security feature.
  • Event authenticity. There is no per-host signature on normalized events. A malicious collector that can write to Kafka can impersonate any host.

[!WARNING] If you cannot fully control Kafka write access, Logster's detections become untrustworthy. An attacker who can write arbitrary events into the raw topics can craft an endpoint history that looks entirely normal, burying real attack evidence.


Stage 2 — Normalized events

Trust boundary: events on normalized-endpoint-events are trusted by the inference service to be well-formed.

What is validated

The inference service validates that normalized events:

  1. Have a non-empty endpoint_id, tenant_id, and platform.
  2. Carry a timestamp that falls within the current sliding window (inference.window minutes).
  3. Have a recognized event_type for the platform (process, file, network, script, syscall).

Events that fail these checks are dropped from the window, with a log message but no alert.

Graph construction safety

When building the heterogeneous graph for a window, the graph builder:

  1. Caps the number of nodes and edges per window. Abnormally large windows (tens of thousands of events) are truncated to protect the GNN from OOM.
  2. Deduplicates nodes by deterministic key (e.g. process_guid for Windows processes, pid for Linux). This prevents a malicious event from inflating the graph with thousands of distinct "processes" that are actually the same one.
  3. Timestamps every edge, so the model can reason about temporal ordering rather than just topology.

If graph construction fails — missing required fields, malformed relationships, window too small to form any edges — the inference service emits an InferenceResult with prediction="error" and moves on. An error prediction never becomes an alert.


Stage 3 — GNN inference

Trust boundary: the pre-trained .pt model file is assumed trustworthy. If an attacker has swapped the model file, Logster's detection pipeline is compromised regardless of any other control.

What is validated

  1. Model file exists and loads cleanly. A load failure crashes the service at startup. Always prefer a hard crash to silently running on a broken model.
  2. Input tensor shape matches the model's expected input. Shape mismatches are caught and logged as inference errors rather than crashes.
  3. Output is a valid 2-class softmax. attack_prob is clamped to [0.0, 1.0]. NaN / Inf outputs are treated as errors.

Threshold enforcement

The threshold between benign and attack is set by inference.threshold (default 0.7). Below that value, the inference is labeled benign regardless of how close it came. The alerts service re-checks this via alerts.min_threshold — there are two independent thresholds, and raising either one is a valid way to suppress noise.

Model integrity

To verify the model file hasn't been tampered with, compute and record a checksum whenever you deploy a new model:

sha256sum models/models/balanced_run_20260114_143653/best_model.pt

Then compare it on every restart. See Model Deployment.


Stage 4 — Alerts

Trust boundary: the alerts service trusts its own in-memory state and the inference results it consumes from Kafka.

What is validated

  1. Minimum threshold. Inferences below alerts.min_threshold never become alerts, even if prediction == "attack".
  2. Dedup key well-formedness. If tenant_id, endpoint_id, or platform is missing, the result is dropped rather than producing a headless alert.
  3. Severity derivation. The service maps attack_prob to the severity enum deterministically. There is no way for an inference to "claim" a severity outside the derivation.
  4. State machine bounds. Analyst state transitions (open → ...) are validated to be valid AlertStatus values. See API User Guide: Reference.

What is not validated

  • Identity of the analyst performing a state transition. The API accepts any string as resolved_by and analyst. Without a reverse proxy that injects a trusted identity header, an attacker with API access can attribute any verdict to anyone.

Stage 5 — REST API

Trust boundary: in the default build, everything. See the Threat model.

What is validated

  1. Request schema. FastAPI + Pydantic automatically validate every request body against its schema. Missing required fields or wrong types return 422.
  2. Path parameters. Alert IDs are passed through to the store unchanged but are not directly executed against any query language (the store is a dict in the default build, not SQL).
  3. Status enum. PATCH /alerts/{id} validates the status value against the AlertStatus enum. Invalid values return 400.
  4. Pagination bounds. limit is enforced between 1 and 1000; offset must be >= 0.

What is not validated

  • Authentication. There is none — see Configuration.
  • Authorization. Everyone who reaches the API sees every tenant and every alert.
  • Rate limiting. The API does not rate-limit callers. A broken integration can easily DoS it.

CORS

The CORS middleware is configured from api.cors_origins in deploy/service-config.yaml. The default ["*"] accepts any origin. In production, restrict this to your dashboard origin(s) only. See Configuration: CORS.


Stage 6 — Dashboard

Trust boundary: The dashboard reads directly from Elasticsearch. Its backend has no write access to the alert store.

What is validated

The dashboard's Express backend validates query parameters on every route (time ranges, hostnames, platform filters). Invalid inputs return 400 or 500 and are logged.

What is not validated

  • Authentication. DISABLE_AUTH=true by default.
  • Authorization. Same problem as the API — every authenticated user sees every tenant's data.
  • ES injection via query parameters. The Express backend uses parameterized Elasticsearch queries via the official ES client, so direct injection is not a concern. However, a malicious user could submit extremely expensive queries (very wide time ranges, huge aggregation sizes) and overload the cluster. Mitigate by enforcing an ES query cost budget at the reverse proxy layer.

Audit logging

Logster does not currently produce an explicit audit log of analyst actions. The closest equivalents are:

  • Alert update history. Every PATCH / feedback call updates updated_at, resolved_by, and appends to analyst_notes. The alert store keeps the latest state but not the full history.
  • Service-level tracing. Every API call is captured by OpenTelemetry and shipped to Tempo (via the in-stack OpenTelemetry Collector). Traces contain timestamps, route paths, and status codes — but not request bodies, so you cannot reconstruct "who changed alert X to false_positive" from traces alone.
  • Docker container logs. FastAPI logs every request at INFO level. Ship these to your log aggregator if you need a durable audit trail.

For a real audit log of analyst actions, the recommended pattern is to record every /feedback and /alerts/{id} PATCH at the reverse proxy layer with the authenticated identity from your identity provider.


Where to go next