Admin Guide: Log Sources
This page documents exactly where Logster expects real-time log data to come from on each supported platform, which Kafka topic each source feeds, and the exact format the normalizer will accept on that topic.
Every format and sample on this page is verified against the normalizer parsers in services/normalizer/src/logster_normalizer/parsers.py and against the canonical sample messages kept at stress-tests/sample-kafka-messages.md. If you extend Logster to accept a new log source, update both files and then update this page.
Architecture — how logs get in
Logster does not collect logs from endpoints directly. Every raw event reaches Logster the same way: an endpoint-side collector produces events, an endpoint-side shipper publishes them onto a Kafka topic, and the Logster normalizer consumes from that topic.
┌───────────────┐ ┌─────────────┐ ┌────────┐ ┌────────────┐
│ Collector │──▶│ Shipper │──▶│ Kafka │──▶│ Normalizer │
│ (endpoint) │ │ (endpoint) │ │ topic │ │ (Logster) │
└───────────────┘ └─────────────┘ └────────┘ └────────────┘
- Collector = the thing on the endpoint that observes system activity: Sysmon, auditd, or an eBPF tracer.
- Shipper = the thing on the endpoint that forwards collected events to Kafka: Winlogbeat (Windows) or a custom agent (Linux, per stress-tests/sample-kafka-messages.md).
- Kafka topic = the trust boundary. Everything written to a raw topic is treated as untrusted input until the normalizer validates it.
- Normalizer = the Logster service that parses each raw
format into a unified
NormalizedEvent. Source: services/normalizer/src/logster_normalizer/parsers.py.
[!IMPORTANT] Logster does not bundle the endpoint collector or shipper. You must deploy them separately. The Compose stack only brings up Kafka, the normalizer, and everything downstream of the normalizer.
Supported platforms
The Platform enum in
libs/logster-common/logster_common/schemas/events.py
defines the platforms the current codebase handles:
| Value | Purpose |
|---|---|
windows |
Windows endpoints — Sysmon via Winlogbeat |
linux |
Linux endpoints — auditd and eBPF |
Note. Logster Support's product materials list macOS as a supported endpoint platform. The current
Platformenum in the codebase does not include it. Confirm macOS collection support with Logster Support before planning a macOS rollout, and treat this section as reflecting only what is implemented in the present build.
Windows — Sysmon via Winlogbeat
What collects the events
Sysmon (System Monitor) is a Windows system service and
device driver from Microsoft's Sysinternals suite. It writes
detailed process, file, registry, and network events to the
Windows event log channel
Microsoft-Windows-Sysmon/Operational.
Logster does not ship a Sysmon configuration. Use a vetted public Sysmon config for your fleet.
What ships the events
Winlogbeat (from Elastic) reads the
Microsoft-Windows-Sysmon/Operational channel and publishes
JSON-encoded events to Kafka via its Kafka output.
The sample message at
stress-tests/sample-kafka-messages.md
confirms this path — every Windows sample in the repo comes
from a Winlogbeat agent ("type": "winlogbeat").
Kafka topic
Partitions: 6 (verified in
deploy/docker-compose.yml under
the kafka-init container).
Sysmon event IDs the normalizer handles
Source:
parsers.py parse_sysmon_event.
| Sysmon event ID | Meaning | Normalized event_type |
|---|---|---|
| 1 | Process Create | process |
| 3 | Network Connection | network |
| 11 | File Create | file |
| 4104 | PowerShell Script Block Logging | script |
| (any other) | Fallback | process with sysmon_event_id and raw event_data preserved under data |
Unknown event IDs are not dropped — they are passed through as a
generic process event with the raw event_data retained. This
means custom Sysmon configurations that emit less-common IDs
(e.g. 7 — image loaded, 22 — DNS query) will still reach
Elasticsearch, they just won't enrich the inference graph with
typed fields.
Field extraction per event ID
Source:
parsers.py.
Field names on the left are the keys the parser looks up under
winlog.event_data.* in the raw Winlogbeat JSON.
Event ID 1 — Process Create
Raw field (event_data.*) |
Normalized data.* |
|---|---|
ProcessGuid |
process_guid |
ProcessId |
process_id (int) |
Image |
image |
CommandLine |
command_line |
User |
user |
ParentProcessGuid |
parent_guid |
ParentImage |
parent_image |
ParentCommandLine |
parent_command_line |
IntegrityLevel |
integrity_level |
CurrentDirectory |
current_directory |
Hashes |
hashes |
Event ID 3 — Network Connection
| Raw field | Normalized data.* |
|---|---|
SourceIp |
source_ip |
SourcePort |
source_port |
DestinationIp |
dest_ip |
DestinationPort |
dest_port |
Protocol |
protocol |
ProcessGuid |
process_guid |
Image |
image |
Event ID 11 — File Create
| Raw field | Normalized data.* |
|---|---|
TargetFilename |
target_filename |
ProcessGuid |
creator_guid |
Image |
creator_image |
Event ID 4104 — PowerShell Script Block
| Raw field | Normalized data.* |
|---|---|
ScriptBlockText |
script_block_text |
ScriptBlockId |
script_block_id |
Path |
path |
Hostname resolution
parse_sysmon_event resolves the endpoint hostname in this
order:
agent.hostnamehost.namecomputer_name- Literal
"unknown"if none are present.
Timestamp resolution
The parser reads winlog.event_data.UtcTime first, then falls
back to the top-level @timestamp field, then to wall-clock
time if neither is parseable. Both ISO-8601 with T and
YYYY-MM-DD HH:MM:SS forms are accepted.
Sample payload
A real Winlogbeat Sysmon event as captured in stress-tests/sample-kafka-messages.md:
{
"@timestamp": "2026-03-18T19:07:15.560Z",
"winlog": {
"computer_name": "DESKTOP-1NNIMRR",
"provider_name": "Microsoft-Windows-Sysmon",
"event_id": "3",
"channel": "Microsoft-Windows-Sysmon/Operational",
"event_data": {
"SourceIp": "10.0.0.104",
"SourcePort": "51775",
"DestinationIp": "104.208.16.88",
"DestinationPort": "443",
"Protocol": "tcp",
"Image": "C:\\Users\\abdullah\\AppData\\Local\\Microsoft\\OneDrive\\...\\OneDrive.Sync.Service.exe",
"ProcessGuid": "{96d8290c-d08c-69b9-2a07-000000000700}",
"ProcessId": "5940",
"User": "DESKTOP-1NNIMRR\\abdullah",
"UtcTime": "2026-03-18 19:07:13.512"
}
},
"agent": {
"name": "DESKTOP-1NNIMRR",
"type": "winlogbeat",
"version": "8.17.0"
},
"host": {
"name": "desktop-1nnimrr",
"os": { "family": "windows", "name": "Windows 10 Pro" }
},
"event": {
"code": "3",
"provider": "Microsoft-Windows-Sysmon"
}
}
The full sample (including every field Winlogbeat ships) is at stress-tests/sample-kafka-messages.md.
Linux — auditd
What collects the events
The Linux Audit daemon (auditd) records kernel and userspace
audit events according to rules loaded from
/etc/audit/audit.rules. Records are plain-text and follow the
familiar type=SYSCALL msg=audit(<epoch>:<serial>): <fields>
form.
Logster does not ship an auditd rule set. Your Linux security baseline already has one — reuse it.
What ships the events
stress-tests/sample-kafka-messages.md labels the Linux auditd source as "custom agent (auditd)". There is no specific commercial shipper bundled with Logster for this path; the endpoint agent is customer-provided or Logster Support-provided.
TBD — canonical Linux shipper. Confirm with Logster Support which agent ships auditd events to Kafka in production deployments (rsyslog with
omkafka, syslog-ng, a dedicated Go/Rust agent, or something else), and populate this section with the verified choice.
Kafka topic
Partitions: 6.
Expected format
The normalizer accepts a simple JSON envelope with a single
message field carrying the raw auditd text record. Source:
parsers.py parse_auditd_event.
Required fields on the Kafka message:
| Field | Type | Notes |
|---|---|---|
message |
string | The raw auditd record, verbatim. |
host.name or host (string) or hostname |
string | The endpoint identifier. |
timestamp |
float | string | Optional — falls back to wall-clock if missing or unparseable. |
_tenant_id |
string | Optional — defaults to "default". |
The normalizer does not parse the auditd text further in
this stage — the entire record is stored as-is on
NormalizedEvent.data.message. Downstream graph-building code
re-parses the text for specific fields.
Every auditd event is emitted with event_type = "syscall"
regardless of the underlying auditd record type. The
distinction between type=SYSCALL, type=PATH, type=EXECVE,
etc. is preserved inside data.message but not lifted to a
separate event type.
Sample payload
Verbatim from stress-tests/sample-kafka-messages.md:
{
"@timestamp": "2026-03-20T01:58:10.344285+05:00",
"host": {
"name": "agent-linux-01"
},
"log_source": "audit",
"message": "type=PATH msg=audit(1773953890.204:50364): item=0 name=\"/run/systemd/ask-password-block/\" inode=5375 dev=00:1b mode=040700 ouid=0 ogid=0 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0\u001dOUID=\"root\" OGID=\"root\""
}
Auditd record types you'll commonly see in the message field:
type=SYSCALL— the syscall invocation itself (pid, ppid, uid, euid, exe, comm, key)type=PATH— a path referenced by a syscall (filename, inode, mode, ouid/ogid)type=EXECVE— arguments passed toexecvetype=CWD— current working directory at the time of the syscalltype=PROCTITLE— the process title as visible to/proc/<pid>/cmdline
These are standard auditd record types. The list is not exhaustive — anything auditd produces is accepted.
Linux — eBPF
What collects the events
An eBPF tracer running on the endpoint observes three families of kernel events directly in the Linux kernel: process execution, file access, and network activity. Each family goes to its own Kafka topic.
Logster does not bundle the eBPF tracer. The message format expected on the Kafka side (see below) is a line-oriented text protocol, so any eBPF tooling that can be wired to emit those lines will work.
What ships the events
Per stress-tests/sample-kafka-messages.md, the eBPF events are labeled as coming from a "custom agent (eBPF process tracer / file tracer / network tracer)". No specific commercial tool is bundled.
TBD — canonical eBPF tracer. Confirm with Logster Support which eBPF tracer produces these events in production deployments (Falco, Tetragon, a custom bcc/libbpf tool, etc.) and update this section.
Kafka topics
Three separate topics, one per event family. Each has 6 partitions.
| Topic | Family | event_type |
Parser |
|---|---|---|---|
linux-ebpf-process-logs |
Process execution | process |
_parse_ebpf_process_data |
linux-ebpf-file-logs |
File access | file |
_parse_ebpf_file_data |
linux-ebpf-network-logs |
Network connections | network |
_parse_ebpf_network_data |
All three parsers live in services/normalizer/src/logster_normalizer/parsers.py.
Kafka message envelope
All three eBPF topics use the same JSON envelope as auditd —
the interesting content is in the message field, and that
field carries a line-oriented text record with a family-specific
prefix.
| Field | Type | Notes |
|---|---|---|
message |
string | The text record in one of the [EXEC] / [FILE] / [CONNECT] / [ACCEPT] / [SOCKET] forms below. |
host.name or host (string) or hostname |
string | Endpoint identifier. |
timestamp |
float | string | Optional. |
_tenant_id |
string | Optional. |
Message formats the normalizer recognises
The normalizer parses each family with a regex. These regexes are the contract — any eBPF tooling you use must produce lines that match them.
Source: parsers.py lines 163–193.
linux-ebpf-process-logs — [EXEC]
Regex:
Fields extracted into the normalized data.*:
| Regex group | Normalized field | Notes |
|---|---|---|
pid |
process_id (int) |
— |
ppid |
parent_image |
Stored as a placeholder /usr/bin/ppid:<ppid> because ppid→image resolution is not performed at normalization time. |
uid |
user |
Stored as uid:<uid>. Also used to set integrity_level (root if uid=0, else user). |
comm |
image (fallback) |
If ARGS is present and begins with /, that path becomes image. Otherwise image = /usr/bin/<comm>. |
args |
command_line |
Falls back to comm if ARGS is empty. |
Sample (verified from stress-tests/sample-kafka-messages.md):
{
"@timestamp": "2026-03-20T02:24:50.451492+05:00",
"host": { "name": "agent-linux-01" },
"log_source": "process",
"message": "2026-03-20 02:24:50 [EXEC] pid=35747 ppid=33785 uid=1000 bash /usr/bin/ls"
}
linux-ebpf-file-logs — [FILE]
Regex:
The ! after FILE is tolerated by the regex — an agent may
emit [FILE!] to flag a suspicious access.
Fields extracted:
| Regex group | Normalized field |
|---|---|
pid |
process_id (int) |
comm |
creator_image — stored as /usr/bin/<comm> |
flags |
flags — the open()-style flags verbatim, e.g. O_RDONLY |
path |
target_filename |
Sample:
{
"@timestamp": "2026-03-20T02:24:27.254181+05:00",
"host": { "name": "agent-linux-01" },
"log_source": "file",
"message": "2026-03-20 02:24:27 [FILE] pid=23256 snapd flags=O_RDONLY /var/lib/snapd/assertions/asserts-v0/model/16/generic/generic-classic/active"
}
linux-ebpf-network-logs — [CONNECT] / [ACCEPT] / [SOCKET]
The network parser recognises two shapes.
Connection form (outbound or accepted inbound):
Regex:
Fields:
| Regex group | Normalized field |
|---|---|
CONNECT/ACCEPT |
connection_type |
pid |
process_id |
comm |
image — stored as /usr/bin/<comm> |
daddr |
dest_ip |
dport |
dest_port |
Socket-open form (bare socket creation, no destination yet):
Regex:
Fields:
| Regex group | Normalized field |
|---|---|
pid |
process_id |
comm |
image — /usr/bin/<comm> |
af_family |
socket_type — e.g. AF_INET, AF_INET6, AF_UNIX |
Sample ([CONNECT] form):
{
"@timestamp": "2026-03-20T02:24:31.785786+05:00",
"host": { "name": "agent-linux-01" },
"log_source": "network",
"message": "2026-03-20 02:24:31 [CONNECT] pid=35113 rdk:broker-1 -> 10.0.0.106:29092"
}
[!NOTE] A message that doesn't match any of the eBPF regexes is not dropped. The parser stores the original
messageonNormalizedEvent.data.messageand emits the event with the topic-derivedevent_type. Graph-building downstream may treat these as low-information and ignore them.
Topic-to-source summary
A single-table reference for operators:
| Kafka topic | Platform | Collector | Shipper (per repo) | Event format | Partitions |
|---|---|---|---|---|---|
sysmon-logs |
Windows | Sysmon | Winlogbeat | Winlogbeat JSON | 6 |
linux-auditd-logs |
Linux | auditd daemon | custom agent | JSON envelope + auditd text | 6 |
linux-ebpf-process-logs |
Linux | eBPF process tracer | custom agent | JSON envelope + [EXEC] text |
6 |
linux-ebpf-file-logs |
Linux | eBPF file tracer | custom agent | JSON envelope + [FILE] text |
6 |
linux-ebpf-network-logs |
Linux | eBPF network tracer | custom agent | JSON envelope + [CONNECT] / [ACCEPT] / [SOCKET] text |
6 |
Topic names and partition counts verified against the
kafka-init container in
deploy/docker-compose.yml.
Envelope fields common to every topic
Every raw Kafka message may include these top-level fields.
None are required by the parser except message (for Linux
sources) or winlog.* (for Sysmon); the rest are best-effort
lookups.
| Field | Used by | Notes |
|---|---|---|
@timestamp |
All | Fallback timestamp if the inner record doesn't carry one. |
host.name / host (string) / hostname |
Linux parsers | Resolves the endpoint identifier. |
agent.hostname / host.name / computer_name |
Sysmon parser | Resolves the endpoint identifier. |
timestamp |
Linux parsers | Numeric epoch; tried before @timestamp. |
_tenant_id |
All parsers | Defaults to "default" if missing. Used to populate the tenant_id field on every NormalizedEvent. |
The endpoint hostname is lowercased on the way into
NormalizedEvent.endpoint_id, which is why you will see
desktop-1nnimrr in the dashboard even though Winlogbeat
reports DESKTOP-1NNIMRR.
Production ingestion agents
Logster Support's product materials list the following shippers as natively supported for feeding Logster in production:
- Windows: Elastic Winlogbeat, Splunk Universal Forwarder, Windows Subscription Logging.
- Linux: rsyslog (native), Splunk Universal Forwarder, syslog-ng.
The only shipper observed in the repo's sample messages is Winlogbeat for Sysmon. The Linux samples are labeled "custom agent" — meaning the endpoint publisher is customer-provided or Logster Support-provided, not one of the named commercial shippers by default. If you are integrating one of the named commercial shippers (for example, the Splunk Universal Forwarder), the integration recipe is not yet documented — see Splunk Integration Guide for the stub.
Verifying events end-to-end
Once your collectors and shipper are in place, verify each topic is receiving traffic:
# Peek at each raw topic (control-C after a few messages)
docker compose exec kafka kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic sysmon-logs --max-messages 5
docker compose exec kafka kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic linux-auditd-logs --max-messages 5
docker compose exec kafka kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic linux-ebpf-process-logs --max-messages 5
Check that the normalizer is successfully parsing them:
No parse errors in the log and a climbing
normalized_events_total Prometheus counter mean the
pipeline is healthy.
Confirm normalized events are landing in Elasticsearch:
The count should grow as events flow.
Where to go next
- Installation — stand up the stack before wiring a real endpoint.
- Installation Parameters
—
kafka.*topic configuration. - Splunk Integration Guide — (stub) production Splunk UF pattern.
- Troubleshooting Guide — what to do when events reach the normalizer but are too sparse to build useful graphs.