Skip to content

Admin Guide: Log Sources

This page documents exactly where Logster expects real-time log data to come from on each supported platform, which Kafka topic each source feeds, and the exact format the normalizer will accept on that topic.

Every format and sample on this page is verified against the normalizer parsers in services/normalizer/src/logster_normalizer/parsers.py and against the canonical sample messages kept at stress-tests/sample-kafka-messages.md. If you extend Logster to accept a new log source, update both files and then update this page.


Architecture — how logs get in

Logster does not collect logs from endpoints directly. Every raw event reaches Logster the same way: an endpoint-side collector produces events, an endpoint-side shipper publishes them onto a Kafka topic, and the Logster normalizer consumes from that topic.

┌───────────────┐   ┌─────────────┐   ┌────────┐   ┌────────────┐
│  Collector    │──▶│   Shipper   │──▶│ Kafka  │──▶│ Normalizer │
│  (endpoint)   │   │ (endpoint)  │   │ topic  │   │ (Logster)  │
└───────────────┘   └─────────────┘   └────────┘   └────────────┘
  • Collector = the thing on the endpoint that observes system activity: Sysmon, auditd, or an eBPF tracer.
  • Shipper = the thing on the endpoint that forwards collected events to Kafka: Winlogbeat (Windows) or a custom agent (Linux, per stress-tests/sample-kafka-messages.md).
  • Kafka topic = the trust boundary. Everything written to a raw topic is treated as untrusted input until the normalizer validates it.
  • Normalizer = the Logster service that parses each raw format into a unified NormalizedEvent. Source: services/normalizer/src/logster_normalizer/parsers.py.

[!IMPORTANT] Logster does not bundle the endpoint collector or shipper. You must deploy them separately. The Compose stack only brings up Kafka, the normalizer, and everything downstream of the normalizer.


Supported platforms

The Platform enum in libs/logster-common/logster_common/schemas/events.py defines the platforms the current codebase handles:

Value Purpose
windows Windows endpoints — Sysmon via Winlogbeat
linux Linux endpoints — auditd and eBPF

Note. Logster Support's product materials list macOS as a supported endpoint platform. The current Platform enum in the codebase does not include it. Confirm macOS collection support with Logster Support before planning a macOS rollout, and treat this section as reflecting only what is implemented in the present build.


Windows — Sysmon via Winlogbeat

What collects the events

Sysmon (System Monitor) is a Windows system service and device driver from Microsoft's Sysinternals suite. It writes detailed process, file, registry, and network events to the Windows event log channel Microsoft-Windows-Sysmon/Operational.

Logster does not ship a Sysmon configuration. Use a vetted public Sysmon config for your fleet.

What ships the events

Winlogbeat (from Elastic) reads the Microsoft-Windows-Sysmon/Operational channel and publishes JSON-encoded events to Kafka via its Kafka output.

The sample message at stress-tests/sample-kafka-messages.md confirms this path — every Windows sample in the repo comes from a Winlogbeat agent ("type": "winlogbeat").

Kafka topic

sysmon-logs

Partitions: 6 (verified in deploy/docker-compose.yml under the kafka-init container).

Sysmon event IDs the normalizer handles

Source: parsers.py parse_sysmon_event.

Sysmon event ID Meaning Normalized event_type
1 Process Create process
3 Network Connection network
11 File Create file
4104 PowerShell Script Block Logging script
(any other) Fallback process with sysmon_event_id and raw event_data preserved under data

Unknown event IDs are not dropped — they are passed through as a generic process event with the raw event_data retained. This means custom Sysmon configurations that emit less-common IDs (e.g. 7 — image loaded, 22 — DNS query) will still reach Elasticsearch, they just won't enrich the inference graph with typed fields.

Field extraction per event ID

Source: parsers.py. Field names on the left are the keys the parser looks up under winlog.event_data.* in the raw Winlogbeat JSON.

Event ID 1 — Process Create

Raw field (event_data.*) Normalized data.*
ProcessGuid process_guid
ProcessId process_id (int)
Image image
CommandLine command_line
User user
ParentProcessGuid parent_guid
ParentImage parent_image
ParentCommandLine parent_command_line
IntegrityLevel integrity_level
CurrentDirectory current_directory
Hashes hashes

Event ID 3 — Network Connection

Raw field Normalized data.*
SourceIp source_ip
SourcePort source_port
DestinationIp dest_ip
DestinationPort dest_port
Protocol protocol
ProcessGuid process_guid
Image image

Event ID 11 — File Create

Raw field Normalized data.*
TargetFilename target_filename
ProcessGuid creator_guid
Image creator_image

Event ID 4104 — PowerShell Script Block

Raw field Normalized data.*
ScriptBlockText script_block_text
ScriptBlockId script_block_id
Path path

Hostname resolution

parse_sysmon_event resolves the endpoint hostname in this order:

  1. agent.hostname
  2. host.name
  3. computer_name
  4. Literal "unknown" if none are present.

Timestamp resolution

The parser reads winlog.event_data.UtcTime first, then falls back to the top-level @timestamp field, then to wall-clock time if neither is parseable. Both ISO-8601 with T and YYYY-MM-DD HH:MM:SS forms are accepted.

Sample payload

A real Winlogbeat Sysmon event as captured in stress-tests/sample-kafka-messages.md:

{
  "@timestamp": "2026-03-18T19:07:15.560Z",
  "winlog": {
    "computer_name": "DESKTOP-1NNIMRR",
    "provider_name": "Microsoft-Windows-Sysmon",
    "event_id": "3",
    "channel": "Microsoft-Windows-Sysmon/Operational",
    "event_data": {
      "SourceIp": "10.0.0.104",
      "SourcePort": "51775",
      "DestinationIp": "104.208.16.88",
      "DestinationPort": "443",
      "Protocol": "tcp",
      "Image": "C:\\Users\\abdullah\\AppData\\Local\\Microsoft\\OneDrive\\...\\OneDrive.Sync.Service.exe",
      "ProcessGuid": "{96d8290c-d08c-69b9-2a07-000000000700}",
      "ProcessId": "5940",
      "User": "DESKTOP-1NNIMRR\\abdullah",
      "UtcTime": "2026-03-18 19:07:13.512"
    }
  },
  "agent": {
    "name": "DESKTOP-1NNIMRR",
    "type": "winlogbeat",
    "version": "8.17.0"
  },
  "host": {
    "name": "desktop-1nnimrr",
    "os": { "family": "windows", "name": "Windows 10 Pro" }
  },
  "event": {
    "code": "3",
    "provider": "Microsoft-Windows-Sysmon"
  }
}

The full sample (including every field Winlogbeat ships) is at stress-tests/sample-kafka-messages.md.


Linux — auditd

What collects the events

The Linux Audit daemon (auditd) records kernel and userspace audit events according to rules loaded from /etc/audit/audit.rules. Records are plain-text and follow the familiar type=SYSCALL msg=audit(<epoch>:<serial>): <fields> form.

Logster does not ship an auditd rule set. Your Linux security baseline already has one — reuse it.

What ships the events

stress-tests/sample-kafka-messages.md labels the Linux auditd source as "custom agent (auditd)". There is no specific commercial shipper bundled with Logster for this path; the endpoint agent is customer-provided or Logster Support-provided.

TBD — canonical Linux shipper. Confirm with Logster Support which agent ships auditd events to Kafka in production deployments (rsyslog with omkafka, syslog-ng, a dedicated Go/Rust agent, or something else), and populate this section with the verified choice.

Kafka topic

linux-auditd-logs

Partitions: 6.

Expected format

The normalizer accepts a simple JSON envelope with a single message field carrying the raw auditd text record. Source: parsers.py parse_auditd_event.

Required fields on the Kafka message:

Field Type Notes
message string The raw auditd record, verbatim.
host.name or host (string) or hostname string The endpoint identifier.
timestamp float | string Optional — falls back to wall-clock if missing or unparseable.
_tenant_id string Optional — defaults to "default".

The normalizer does not parse the auditd text further in this stage — the entire record is stored as-is on NormalizedEvent.data.message. Downstream graph-building code re-parses the text for specific fields.

Every auditd event is emitted with event_type = "syscall" regardless of the underlying auditd record type. The distinction between type=SYSCALL, type=PATH, type=EXECVE, etc. is preserved inside data.message but not lifted to a separate event type.

Sample payload

Verbatim from stress-tests/sample-kafka-messages.md:

{
  "@timestamp": "2026-03-20T01:58:10.344285+05:00",
  "host": {
    "name": "agent-linux-01"
  },
  "log_source": "audit",
  "message": "type=PATH msg=audit(1773953890.204:50364): item=0 name=\"/run/systemd/ask-password-block/\" inode=5375 dev=00:1b mode=040700 ouid=0 ogid=0 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0\u001dOUID=\"root\" OGID=\"root\""
}

Auditd record types you'll commonly see in the message field:

  • type=SYSCALL — the syscall invocation itself (pid, ppid, uid, euid, exe, comm, key)
  • type=PATH — a path referenced by a syscall (filename, inode, mode, ouid/ogid)
  • type=EXECVE — arguments passed to execve
  • type=CWD — current working directory at the time of the syscall
  • type=PROCTITLE — the process title as visible to /proc/<pid>/cmdline

These are standard auditd record types. The list is not exhaustive — anything auditd produces is accepted.


Linux — eBPF

What collects the events

An eBPF tracer running on the endpoint observes three families of kernel events directly in the Linux kernel: process execution, file access, and network activity. Each family goes to its own Kafka topic.

Logster does not bundle the eBPF tracer. The message format expected on the Kafka side (see below) is a line-oriented text protocol, so any eBPF tooling that can be wired to emit those lines will work.

What ships the events

Per stress-tests/sample-kafka-messages.md, the eBPF events are labeled as coming from a "custom agent (eBPF process tracer / file tracer / network tracer)". No specific commercial tool is bundled.

TBD — canonical eBPF tracer. Confirm with Logster Support which eBPF tracer produces these events in production deployments (Falco, Tetragon, a custom bcc/libbpf tool, etc.) and update this section.

Kafka topics

Three separate topics, one per event family. Each has 6 partitions.

Topic Family event_type Parser
linux-ebpf-process-logs Process execution process _parse_ebpf_process_data
linux-ebpf-file-logs File access file _parse_ebpf_file_data
linux-ebpf-network-logs Network connections network _parse_ebpf_network_data

All three parsers live in services/normalizer/src/logster_normalizer/parsers.py.

Kafka message envelope

All three eBPF topics use the same JSON envelope as auditd — the interesting content is in the message field, and that field carries a line-oriented text record with a family-specific prefix.

Field Type Notes
message string The text record in one of the [EXEC] / [FILE] / [CONNECT] / [ACCEPT] / [SOCKET] forms below.
host.name or host (string) or hostname string Endpoint identifier.
timestamp float | string Optional.
_tenant_id string Optional.

Message formats the normalizer recognises

The normalizer parses each family with a regex. These regexes are the contract — any eBPF tooling you use must produce lines that match them.

Source: parsers.py lines 163–193.

linux-ebpf-process-logs[EXEC]

[EXEC] pid=<PID> ppid=<PPID> uid=<UID> <COMM> [<ARGS...>]

Regex:

\[EXEC\]\s+pid=(\d+)\s+ppid=(\d+)\s+uid=(\d+)\s+(\S+)(?:\s+(.+))?$

Fields extracted into the normalized data.*:

Regex group Normalized field Notes
pid process_id (int)
ppid parent_image Stored as a placeholder /usr/bin/ppid:<ppid> because ppid→image resolution is not performed at normalization time.
uid user Stored as uid:<uid>. Also used to set integrity_level (root if uid=0, else user).
comm image (fallback) If ARGS is present and begins with /, that path becomes image. Otherwise image = /usr/bin/<comm>.
args command_line Falls back to comm if ARGS is empty.

Sample (verified from stress-tests/sample-kafka-messages.md):

{
  "@timestamp": "2026-03-20T02:24:50.451492+05:00",
  "host": { "name": "agent-linux-01" },
  "log_source": "process",
  "message": "2026-03-20 02:24:50 [EXEC]    pid=35747   ppid=33785   uid=1000  bash             /usr/bin/ls"
}

linux-ebpf-file-logs[FILE]

[FILE] pid=<PID> <COMM> flags=<FLAGS> <PATH>

Regex:

\[FILE!?\]\s+pid=(\d+)\s+(\S+)\s+flags=(\S+)\s+(.+)$

The ! after FILE is tolerated by the regex — an agent may emit [FILE!] to flag a suspicious access.

Fields extracted:

Regex group Normalized field
pid process_id (int)
comm creator_image — stored as /usr/bin/<comm>
flags flags — the open()-style flags verbatim, e.g. O_RDONLY
path target_filename

Sample:

{
  "@timestamp": "2026-03-20T02:24:27.254181+05:00",
  "host": { "name": "agent-linux-01" },
  "log_source": "file",
  "message": "2026-03-20 02:24:27 [FILE]    pid=23256   snapd            flags=O_RDONLY   /var/lib/snapd/assertions/asserts-v0/model/16/generic/generic-classic/active"
}

linux-ebpf-network-logs[CONNECT] / [ACCEPT] / [SOCKET]

The network parser recognises two shapes.

Connection form (outbound or accepted inbound):

[CONNECT] pid=<PID> <COMM> -> <DADDR>:<DPORT>
[ACCEPT]  pid=<PID> <COMM> -> <DADDR>:<DPORT>

Regex:

\[(CONNECT|ACCEPT)\]\s+pid=(\d+)\s+(\S+)(?:\s+->\s+(\S+):(\d+))?

Fields:

Regex group Normalized field
CONNECT/ACCEPT connection_type
pid process_id
comm image — stored as /usr/bin/<comm>
daddr dest_ip
dport dest_port

Socket-open form (bare socket creation, no destination yet):

[SOCKET] pid=<PID> <COMM> <AF_FAMILY>

Regex:

\[SOCKET\]\s+pid=(\d+)\s+(\S+)\s+(\S+)

Fields:

Regex group Normalized field
pid process_id
comm image/usr/bin/<comm>
af_family socket_type — e.g. AF_INET, AF_INET6, AF_UNIX

Sample ([CONNECT] form):

{
  "@timestamp": "2026-03-20T02:24:31.785786+05:00",
  "host": { "name": "agent-linux-01" },
  "log_source": "network",
  "message": "2026-03-20 02:24:31 [CONNECT]  pid=35113   rdk:broker-1     -> 10.0.0.106:29092"
}

[!NOTE] A message that doesn't match any of the eBPF regexes is not dropped. The parser stores the original message on NormalizedEvent.data.message and emits the event with the topic-derived event_type. Graph-building downstream may treat these as low-information and ignore them.


Topic-to-source summary

A single-table reference for operators:

Kafka topic Platform Collector Shipper (per repo) Event format Partitions
sysmon-logs Windows Sysmon Winlogbeat Winlogbeat JSON 6
linux-auditd-logs Linux auditd daemon custom agent JSON envelope + auditd text 6
linux-ebpf-process-logs Linux eBPF process tracer custom agent JSON envelope + [EXEC] text 6
linux-ebpf-file-logs Linux eBPF file tracer custom agent JSON envelope + [FILE] text 6
linux-ebpf-network-logs Linux eBPF network tracer custom agent JSON envelope + [CONNECT] / [ACCEPT] / [SOCKET] text 6

Topic names and partition counts verified against the kafka-init container in deploy/docker-compose.yml.


Envelope fields common to every topic

Every raw Kafka message may include these top-level fields. None are required by the parser except message (for Linux sources) or winlog.* (for Sysmon); the rest are best-effort lookups.

Field Used by Notes
@timestamp All Fallback timestamp if the inner record doesn't carry one.
host.name / host (string) / hostname Linux parsers Resolves the endpoint identifier.
agent.hostname / host.name / computer_name Sysmon parser Resolves the endpoint identifier.
timestamp Linux parsers Numeric epoch; tried before @timestamp.
_tenant_id All parsers Defaults to "default" if missing. Used to populate the tenant_id field on every NormalizedEvent.

The endpoint hostname is lowercased on the way into NormalizedEvent.endpoint_id, which is why you will see desktop-1nnimrr in the dashboard even though Winlogbeat reports DESKTOP-1NNIMRR.


Production ingestion agents

Logster Support's product materials list the following shippers as natively supported for feeding Logster in production:

  • Windows: Elastic Winlogbeat, Splunk Universal Forwarder, Windows Subscription Logging.
  • Linux: rsyslog (native), Splunk Universal Forwarder, syslog-ng.

The only shipper observed in the repo's sample messages is Winlogbeat for Sysmon. The Linux samples are labeled "custom agent" — meaning the endpoint publisher is customer-provided or Logster Support-provided, not one of the named commercial shippers by default. If you are integrating one of the named commercial shippers (for example, the Splunk Universal Forwarder), the integration recipe is not yet documented — see Splunk Integration Guide for the stub.


Verifying events end-to-end

Once your collectors and shipper are in place, verify each topic is receiving traffic:

# Peek at each raw topic (control-C after a few messages)
docker compose exec kafka kafka-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic sysmon-logs --max-messages 5

docker compose exec kafka kafka-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic linux-auditd-logs --max-messages 5

docker compose exec kafka kafka-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic linux-ebpf-process-logs --max-messages 5

Check that the normalizer is successfully parsing them:

docker compose logs normalizer | grep -i parse

No parse errors in the log and a climbing normalized_events_total Prometheus counter mean the pipeline is healthy.

Confirm normalized events are landing in Elasticsearch:

curl 'http://localhost:9200/logster-events/_count'

The count should grow as events flow.


Where to go next