Admin Guide: Log Sources

This page documents exactly where Logster expects real-time log data to come from on each supported platform, which Kafka topic each source feeds, and the exact format the normalizer will accept on that topic.

Every format and sample on this page is verified against the normalizer parsers in services/normalizer/src/logster_normalizer/parsers.py and against the canonical sample messages kept at stress-tests/sample-kafka-messages.md. If you extend Logster to accept a new log source, update both files and then update this page.

Architecture — how logs get in

Logster does not collect logs from endpoints directly. Every raw event reaches Logster the same way: an endpoint-side collector produces events, an endpoint-side shipper publishes them onto a Kafka topic, and the Logster normalizer consumes from that topic.

┌───────────────┐   ┌─────────────┐   ┌────────┐   ┌────────────┐
│  Collector    │──▶│   Shipper   │──▶│ Kafka  │──▶│ Normalizer │
│  (endpoint)   │   │ (endpoint)  │   │ topic  │   │ (Logster)  │
└───────────────┘   └─────────────┘   └────────┘   └────────────┘

Collector = the thing on the endpoint that observes system activity: Sysmon, auditd, or an eBPF tracer.
Shipper = the thing on the endpoint that forwards collected events to Kafka: Winlogbeat (Windows) or a custom agent (Linux, per stress-tests/sample-kafka-messages.md).
Kafka topic = the trust boundary. Everything written to a raw topic is treated as untrusted input until the normalizer validates it.
Normalizer = the Logster service that parses each raw format into a unified NormalizedEvent. Source: services/normalizer/src/logster_normalizer/parsers.py.

[!IMPORTANT] Logster does not bundle the endpoint collector or shipper. You must deploy them separately. The Compose stack only brings up Kafka, the normalizer, and everything downstream of the normalizer.

Supported platforms

The Platform enum in libs/logster-common/logster_common/schemas/events.py defines the platforms the current codebase handles:

Value	Purpose
`windows`	Windows endpoints — Sysmon via Winlogbeat
`linux`	Linux endpoints — auditd and eBPF

Note. Logster Support's product materials list macOS as a supported endpoint platform. The current Platform enum in the codebase does not include it. Confirm macOS collection support with Logster Support before planning a macOS rollout, and treat this section as reflecting only what is implemented in the present build.

Windows — Sysmon via Winlogbeat

What collects the events

Sysmon (System Monitor) is a Windows system service and device driver from Microsoft's Sysinternals suite. It writes detailed process, file, registry, and network events to the Windows event log channel Microsoft-Windows-Sysmon/Operational.

Logster does not ship a Sysmon configuration. Use a vetted public Sysmon config for your fleet.

What ships the events

Winlogbeat (from Elastic) reads the Microsoft-Windows-Sysmon/Operational channel and publishes JSON-encoded events to Kafka via its Kafka output.

The sample message at stress-tests/sample-kafka-messages.md confirms this path — every Windows sample in the repo comes from a Winlogbeat agent ("type": "winlogbeat").

Kafka topic

sysmon-logs

Partitions: 6 (verified in deploy/docker-compose.yml under the kafka-init container).

Sysmon event IDs the normalizer handles

Source: parsers.py parse_sysmon_event.

Sysmon event ID	Meaning	Normalized `event_type`
1	Process Create	`process`
3	Network Connection	`network`
11	File Create	`file`
4104	PowerShell Script Block Logging	`script`
(any other)	Fallback	`process` with `sysmon_event_id` and raw `event_data` preserved under `data`

Unknown event IDs are not dropped — they are passed through as a generic process event with the raw event_data retained. This means custom Sysmon configurations that emit less-common IDs (e.g. 7 — image loaded, 22 — DNS query) will still reach Elasticsearch, they just won't enrich the inference graph with typed fields.

Field extraction per event ID

Source: parsers.py. Field names on the left are the keys the parser looks up under winlog.event_data.* in the raw Winlogbeat JSON.

Event ID 1 — Process Create

Raw field (`event_data.*`)	Normalized `data.*`
`ProcessGuid`	`process_guid`
`ProcessId`	`process_id` (int)
`Image`	`image`
`CommandLine`	`command_line`
`User`	`user`
`ParentProcessGuid`	`parent_guid`
`ParentImage`	`parent_image`
`ParentCommandLine`	`parent_command_line`
`IntegrityLevel`	`integrity_level`
`CurrentDirectory`	`current_directory`
`Hashes`	`hashes`

Event ID 3 — Network Connection

Raw field	Normalized `data.*`
`SourceIp`	`source_ip`
`SourcePort`	`source_port`
`DestinationIp`	`dest_ip`
`DestinationPort`	`dest_port`
`Protocol`	`protocol`
`ProcessGuid`	`process_guid`
`Image`	`image`

Event ID 11 — File Create

Raw field	Normalized `data.*`
`TargetFilename`	`target_filename`
`ProcessGuid`	`creator_guid`
`Image`	`creator_image`

Event ID 4104 — PowerShell Script Block

Raw field	Normalized `data.*`
`ScriptBlockText`	`script_block_text`
`ScriptBlockId`	`script_block_id`
`Path`	`path`

Hostname resolution

parse_sysmon_event resolves the endpoint hostname in this order:

agent.hostname
host.name
computer_name
Literal "unknown" if none are present.

Timestamp resolution

The parser reads winlog.event_data.UtcTime first, then falls back to the top-level @timestamp field, then to wall-clock time if neither is parseable. Both ISO-8601 with T and YYYY-MM-DD HH:MM:SS forms are accepted.

Sample payload

A real Winlogbeat Sysmon event as captured in stress-tests/sample-kafka-messages.md:

{
  "@timestamp": "2026-03-18T19:07:15.560Z",
  "winlog": {
    "computer_name": "DESKTOP-1NNIMRR",
    "provider_name": "Microsoft-Windows-Sysmon",
    "event_id": "3",
    "channel": "Microsoft-Windows-Sysmon/Operational",
    "event_data": {
      "SourceIp": "10.0.0.104",
      "SourcePort": "51775",
      "DestinationIp": "104.208.16.88",
      "DestinationPort": "443",
      "Protocol": "tcp",
      "Image": "C:\\Users\\abdullah\\AppData\\Local\\Microsoft\\OneDrive\\...\\OneDrive.Sync.Service.exe",
      "ProcessGuid": "{96d8290c-d08c-69b9-2a07-000000000700}",
      "ProcessId": "5940",
      "User": "DESKTOP-1NNIMRR\\abdullah",
      "UtcTime": "2026-03-18 19:07:13.512"
    }
  },
  "agent": {
    "name": "DESKTOP-1NNIMRR",
    "type": "winlogbeat",
    "version": "8.17.0"
  },
  "host": {
    "name": "desktop-1nnimrr",
    "os": { "family": "windows", "name": "Windows 10 Pro" }
  },
  "event": {
    "code": "3",
    "provider": "Microsoft-Windows-Sysmon"
  }
}

The full sample (including every field Winlogbeat ships) is at stress-tests/sample-kafka-messages.md.

Linux — auditd

What collects the events

The Linux Audit daemon (auditd) records kernel and userspace audit events according to rules loaded from /etc/audit/audit.rules. Records are plain-text and follow the familiar type=SYSCALL msg=audit(<epoch>:<serial>): <fields> form.

Logster does not ship an auditd rule set. Your Linux security baseline already has one — reuse it.

What ships the events

stress-tests/sample-kafka-messages.md labels the Linux auditd source as "custom agent (auditd)". There is no specific commercial shipper bundled with Logster for this path; the endpoint agent is customer-provided or Logster Support-provided.

TBD — canonical Linux shipper. Confirm with Logster Support which agent ships auditd events to Kafka in production deployments (rsyslog with omkafka, syslog-ng, a dedicated Go/Rust agent, or something else), and populate this section with the verified choice.

Kafka topic

linux-auditd-logs

Partitions: 6.

Expected format

The normalizer accepts a simple JSON envelope with a single message field carrying the raw auditd text record. Source: parsers.py parse_auditd_event.

Required fields on the Kafka message:

Field	Type	Notes
`message`	string	The raw auditd record, verbatim.
`host.name` or `host` (string) or `hostname`	string	The endpoint identifier.
`timestamp`	float \| string	Optional — falls back to wall-clock if missing or unparseable.
`_tenant_id`	string	Optional — defaults to `"default"`.

The normalizer does not parse the auditd text further in this stage — the entire record is stored as-is on NormalizedEvent.data.message. Downstream graph-building code re-parses the text for specific fields.

Every auditd event is emitted with event_type = "syscall" regardless of the underlying auditd record type. The distinction between type=SYSCALL, type=PATH, type=EXECVE, etc. is preserved inside data.message but not lifted to a separate event type.

Sample payload

Verbatim from stress-tests/sample-kafka-messages.md:

{
  "@timestamp": "2026-03-20T01:58:10.344285+05:00",
  "host": {
    "name": "agent-linux-01"
  },
  "log_source": "audit",
  "message": "type=PATH msg=audit(1773953890.204:50364): item=0 name=\"/run/systemd/ask-password-block/\" inode=5375 dev=00:1b mode=040700 ouid=0 ogid=0 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0\u001dOUID=\"root\" OGID=\"root\""
}

Auditd record types you'll commonly see in the message field:

type=SYSCALL — the syscall invocation itself (pid, ppid, uid, euid, exe, comm, key)
type=PATH — a path referenced by a syscall (filename, inode, mode, ouid/ogid)
type=EXECVE — arguments passed to execve
type=CWD — current working directory at the time of the syscall
type=PROCTITLE — the process title as visible to /proc/<pid>/cmdline

These are standard auditd record types. The list is not exhaustive — anything auditd produces is accepted.

Linux — eBPF

What collects the events

An eBPF tracer running on the endpoint observes three families of kernel events directly in the Linux kernel: process execution, file access, and network activity. Each family goes to its own Kafka topic.

Logster does not bundle the eBPF tracer. The message format expected on the Kafka side (see below) is a line-oriented text protocol, so any eBPF tooling that can be wired to emit those lines will work.

What ships the events

Per stress-tests/sample-kafka-messages.md, the eBPF events are labeled as coming from a "custom agent (eBPF process tracer / file tracer / network tracer)". No specific commercial tool is bundled.

TBD — canonical eBPF tracer. Confirm with Logster Support which eBPF tracer produces these events in production deployments (Falco, Tetragon, a custom bcc/libbpf tool, etc.) and update this section.

Kafka topics

Three separate topics, one per event family. Each has 6 partitions.

Topic	Family	`event_type`	Parser
`linux-ebpf-process-logs`	Process execution	`process`	`_parse_ebpf_process_data`
`linux-ebpf-file-logs`	File access	`file`	`_parse_ebpf_file_data`
`linux-ebpf-network-logs`	Network connections	`network`	`_parse_ebpf_network_data`

All three parsers live in services/normalizer/src/logster_normalizer/parsers.py.

Kafka message envelope

All three eBPF topics use the same JSON envelope as auditd — the interesting content is in the message field, and that field carries a line-oriented text record with a family-specific prefix.

Field	Type	Notes
`message`	string	The text record in one of the `[EXEC]` / `[FILE]` / `[CONNECT]` / `[ACCEPT]` / `[SOCKET]` forms below.
`host.name` or `host` (string) or `hostname`	string	Endpoint identifier.
`timestamp`	float \| string	Optional.
`_tenant_id`	string	Optional.

Message formats the normalizer recognises

The normalizer parses each family with a regex. These regexes are the contract — any eBPF tooling you use must produce lines that match them.

Source: parsers.py lines 163–193.

`linux-ebpf-process-logs` — `[EXEC]`

[EXEC] pid=<PID> ppid=<PPID> uid=<UID> <COMM> [<ARGS...>]

Regex:

\[EXEC\]\s+pid=(\d+)\s+ppid=(\d+)\s+uid=(\d+)\s+(\S+)(?:\s+(.+))?$

Fields extracted into the normalized data.*:

Regex group	Normalized field	Notes
`pid`	`process_id` (int)	—
`ppid`	`parent_image`	Stored as a placeholder `/usr/bin/ppid:<ppid>` because ppid→image resolution is not performed at normalization time.
`uid`	`user`	Stored as `uid:<uid>`. Also used to set `integrity_level` (`root` if `uid=0`, else `user`).
`comm`	`image` (fallback)	If `ARGS` is present and begins with `/`, that path becomes `image`. Otherwise `image = /usr/bin/<comm>`.
`args`	`command_line`	Falls back to `comm` if `ARGS` is empty.

Sample (verified from stress-tests/sample-kafka-messages.md):

{
  "@timestamp": "2026-03-20T02:24:50.451492+05:00",
  "host": { "name": "agent-linux-01" },
  "log_source": "process",
  "message": "2026-03-20 02:24:50 [EXEC]    pid=35747   ppid=33785   uid=1000  bash             /usr/bin/ls"
}

`linux-ebpf-file-logs` — `[FILE]`

[FILE] pid=<PID> <COMM> flags=<FLAGS> <PATH>

Regex:

\[FILE!?\]\s+pid=(\d+)\s+(\S+)\s+flags=(\S+)\s+(.+)$

The ! after FILE is tolerated by the regex — an agent may emit [FILE!] to flag a suspicious access.

Fields extracted:

Regex group	Normalized field
`pid`	`process_id` (int)
`comm`	`creator_image` — stored as `/usr/bin/<comm>`
`flags`	`flags` — the `open()`-style flags verbatim, e.g. `O_RDONLY`
`path`	`target_filename`

Sample:

{
  "@timestamp": "2026-03-20T02:24:27.254181+05:00",
  "host": { "name": "agent-linux-01" },
  "log_source": "file",
  "message": "2026-03-20 02:24:27 [FILE]    pid=23256   snapd            flags=O_RDONLY   /var/lib/snapd/assertions/asserts-v0/model/16/generic/generic-classic/active"
}

`linux-ebpf-network-logs` — `[CONNECT]` / `[ACCEPT]` / `[SOCKET]`

The network parser recognises two shapes.

Connection form (outbound or accepted inbound):

[CONNECT] pid=<PID> <COMM> -> <DADDR>:<DPORT>
[ACCEPT]  pid=<PID> <COMM> -> <DADDR>:<DPORT>

Regex:

\[(CONNECT|ACCEPT)\]\s+pid=(\d+)\s+(\S+)(?:\s+->\s+(\S+):(\d+))?

Fields:

Regex group	Normalized field
`CONNECT`/`ACCEPT`	`connection_type`
`pid`	`process_id`
`comm`	`image` — stored as `/usr/bin/<comm>`
`daddr`	`dest_ip`
`dport`	`dest_port`

Socket-open form (bare socket creation, no destination yet):

[SOCKET] pid=<PID> <COMM> <AF_FAMILY>

Regex:

\[SOCKET\]\s+pid=(\d+)\s+(\S+)\s+(\S+)

Fields:

Regex group	Normalized field
`pid`	`process_id`
`comm`	`image` — `/usr/bin/<comm>`
`af_family`	`socket_type` — e.g. `AF_INET`, `AF_INET6`, `AF_UNIX`

Sample ([CONNECT] form):

{
  "@timestamp": "2026-03-20T02:24:31.785786+05:00",
  "host": { "name": "agent-linux-01" },
  "log_source": "network",
  "message": "2026-03-20 02:24:31 [CONNECT]  pid=35113   rdk:broker-1     -> 10.0.0.106:29092"
}

[!NOTE] A message that doesn't match any of the eBPF regexes is not dropped. The parser stores the original message on NormalizedEvent.data.message and emits the event with the topic-derived event_type. Graph-building downstream may treat these as low-information and ignore them.

Topic-to-source summary

A single-table reference for operators:

Kafka topic	Platform	Collector	Shipper (per repo)	Event format	Partitions
`sysmon-logs`	Windows	Sysmon	Winlogbeat	Winlogbeat JSON	6
`linux-auditd-logs`	Linux	auditd daemon	custom agent	JSON envelope + auditd text	6
`linux-ebpf-process-logs`	Linux	eBPF process tracer	custom agent	JSON envelope + `[EXEC]` text	6
`linux-ebpf-file-logs`	Linux	eBPF file tracer	custom agent	JSON envelope + `[FILE]` text	6
`linux-ebpf-network-logs`	Linux	eBPF network tracer	custom agent	JSON envelope + `[CONNECT]` / `[ACCEPT]` / `[SOCKET]` text	6

Topic names and partition counts verified against the kafka-init container in deploy/docker-compose.yml.

Envelope fields common to every topic

Every raw Kafka message may include these top-level fields. None are required by the parser except message (for Linux sources) or winlog.* (for Sysmon); the rest are best-effort lookups.

Field	Used by	Notes
`@timestamp`	All	Fallback timestamp if the inner record doesn't carry one.
`host.name` / `host` (string) / `hostname`	Linux parsers	Resolves the endpoint identifier.
`agent.hostname` / `host.name` / `computer_name`	Sysmon parser	Resolves the endpoint identifier.
`timestamp`	Linux parsers	Numeric epoch; tried before `@timestamp`.
`_tenant_id`	All parsers	Defaults to `"default"` if missing. Used to populate the `tenant_id` field on every `NormalizedEvent`.

The endpoint hostname is lowercased on the way into NormalizedEvent.endpoint_id, which is why you will see desktop-1nnimrr in the dashboard even though Winlogbeat reports DESKTOP-1NNIMRR.

Production ingestion agents

Logster Support's product materials list the following shippers as natively supported for feeding Logster in production:

Windows: Elastic Winlogbeat, Splunk Universal Forwarder, Windows Subscription Logging.
Linux: rsyslog (native), Splunk Universal Forwarder, syslog-ng.

The only shipper observed in the repo's sample messages is Winlogbeat for Sysmon. The Linux samples are labeled "custom agent" — meaning the endpoint publisher is customer-provided or Logster Support-provided, not one of the named commercial shippers by default. If you are integrating one of the named commercial shippers (for example, the Splunk Universal Forwarder), the integration recipe is not yet documented — see Splunk Integration Guide for the stub.

Verifying events end-to-end

Once your collectors and shipper are in place, verify each topic is receiving traffic:

# Peek at each raw topic (control-C after a few messages)
docker compose exec kafka kafka-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic sysmon-logs --max-messages 5

docker compose exec kafka kafka-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic linux-auditd-logs --max-messages 5

docker compose exec kafka kafka-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic linux-ebpf-process-logs --max-messages 5

Check that the normalizer is successfully parsing them:

docker compose logs normalizer | grep -i parse

No parse errors in the log and a climbing normalized_events_total Prometheus counter mean the pipeline is healthy.

Confirm normalized events are landing in Elasticsearch:

curl 'http://localhost:9200/logster-events/_count'

The count should grow as events flow.

Where to go next

Installation — stand up the stack before wiring a real endpoint.
Installation Parameters — kafka.* topic configuration.
Splunk Integration Guide — (stub) production Splunk UF pattern.
Troubleshooting Guide — what to do when events reach the normalizer but are too sparse to build useful graphs.

Admin Guide: Log Sources

Architecture — how logs get in

Supported platforms

Windows — Sysmon via Winlogbeat

What collects the events

What ships the events

Kafka topic

Sysmon event IDs the normalizer handles

Field extraction per event ID

Event ID 1 — Process Create

Event ID 3 — Network Connection

Event ID 11 — File Create

Event ID 4104 — PowerShell Script Block

Hostname resolution

Timestamp resolution

Sample payload

Linux — auditd

What collects the events

What ships the events

Kafka topic

Expected format

Sample payload

Linux — eBPF

What collects the events

What ships the events

Kafka topics

Kafka message envelope

Message formats the normalizer recognises

linux-ebpf-process-logs — [EXEC]

linux-ebpf-file-logs — [FILE]

linux-ebpf-network-logs — [CONNECT] / [ACCEPT] / [SOCKET]

Topic-to-source summary

Envelope fields common to every topic

Production ingestion agents

Verifying events end-to-end

Where to go next

`linux-ebpf-process-logs` — `[EXEC]`

`linux-ebpf-file-logs` — `[FILE]`

`linux-ebpf-network-logs` — `[CONNECT]` / `[ACCEPT]` / `[SOCKET]`