Architecture¶

Fenrir is one process. Everything else is just glue.

The 60-second mental model¶

flowchart LR
    L[System logs<br/>auth.log, nginx, fail2ban,<br/>kern.log, dpkg.log, ...] -->|tail -f| M
    K[Periodic snapshots<br/>baseline, CVE feed] -->|cron| M
    M[9 Monitors] -->|RawEvent| Q[(Event queue<br/>asyncio)]
    Q --> RE[Rule Engine<br/>classify]
    RE -->|ThreatEvent| DB[(PostgreSQL<br/>or SQLite)]
    RE -->|HIGH/CRITICAL| DISP[Investigation<br/>Dispatcher]
    DISP -->|spawn| AA[AI Analyst<br/>+ playbook]
    AA -->|read| TOOLS[Tools<br/>shell, geoip, threat intel]
    AA -->|cloud LLM call| ANON[PII Anonymizer]
    ANON -->|tokenized| LLM[(OpenRouter<br/>or Ollama)]
    AA -->|persist| DB
    AA -->|low-risk auto| RESP[Responders<br/>fail2ban ban]
    AA -->|high-confidence| AUTO[Autonomous<br/>kill / quarantine /<br/>stop / isolate / rollback]
    AUTO -->|persist + revert button| DB
    AUTO -->|notify| TG[Telegram bot]
    RE -->|alert| TG
    AA -->|verdict report| TG
    DB --> WEB[Web dashboard<br/>aiohttp]

The 5 components¶

1. Monitors¶

Nine small classes, each watching one thing:

Monitor	Source	What it sees
`auth_monitor`	`/var/log/auth.log`	SSH login attempts (success, failure, brute force)
`honeypot_monitor`	`/var/log/nginx/honeypot.log`	Hits on fake admin pages (`/wp-admin`, `/.env`, ...)
`fail2ban_monitor`	`/var/log/fail2ban.log`	Bans, unbans, repeat offenders
`nginx_monitor`	`/var/log/nginx/access.log`	Suspicious patterns in real traffic
`ufw_monitor`	`/var/log/ufw.log`	Firewall drops, port scans
`kernel_monitor`	`/var/log/kern.log`	USB connect/disconnect, OOM, segfault, AppArmor denials
`package_monitor`	`/var/log/dpkg.log`	Package install/upgrade/remove
`baseline_monitor`	periodic (10 min)	Drift in listening ports, services, users, setuid bins
`cve_monitor`	periodic (6 h)	Pending security upgrades (`apt list --upgradable`)

Six are log tailers (tail-based, byte-streaming). Three are periodic snapshotters (run on a timer).

All produce RawEvent objects and push them onto a single asyncio.Queue.

2. Rule engine¶

A pure-Python classifier. Takes a RawEvent, returns a ThreatEvent:

@dataclass
class ThreatEvent:
    id: str
    timestamp: datetime
    source: str              # "auth", "honeypot", "package", ...
    severity: Severity       # INFO / LOW / MEDIUM / HIGH / CRITICAL
    category: str            # "ssh_brute_force", "honeypot_hit", ...
    ip: Optional[str]
    description: str
    confidence: int          # 0-100
    raw_line: str
    # ... (geoip, ai analysis, country, etc.)

The rule engine is deterministic and fast. It does not call an LLM. The LLM only enters the picture during AI investigations (next step).

The rule engine can also escalate based on history (e.g. an IP that was banned twice in 24h gets CRITICAL even on its third honeypot hit).

3. AI Investigation pipeline¶

When a ThreatEvent has severity >= HIGH, the dispatcher:

Checks dedupe cache (skip if the same (category, ip) was investigated recently).
Checks concurrency cap to keep LLM cost and CPU bounded.
Spawns an AnalystAgent task.

Dedupe window and concurrency cap are tunable defaults — production values live in the Hardening checklist.

The agent:

Loads the playbook for that event category from p3guardian/ai/playbooks/<category>.md.
Builds a system prompt = base analyst persona + playbook + event details.
Runs the LLM tool-loop (max 5 rounds): the LLM picks tools (shell, log search, geoip, threat intel), Fenrir executes them, results feed back into the prompt.
Parses a structured final_report JSON from the LLM's last response.
Persists three rows: InvestigationJob, InvestigationReport, zero or more InvestigationIOC.
Optionally executes low-risk auto-actions (ban_ip via fail2ban-client, when verdict=confirmed_threat and confidence >= 80).
Optionally dispatches autonomous response actions (kill_process, file_quarantine, service_stop, isolate_network, package_rollback) to the autonomous responder. Off by default (AUTO_ACTION_ENABLED=false); when enabled, defaults to AUTO_ACTION_DRY_RUN=true for at least one week. Every attempt is persisted with revert metadata; the operator reverts via Telegram inline button or web UI.
Sends a Telegram alert with the verdict + summary.

If OPENROUTER_API_KEY is set, the LLM is OpenRouter (cloud). Otherwise, fallback to local Ollama.

When OpenRouter is used, the PII anonymizer wraps each prompt — replaces real values with placeholder tokens before sending, restores them in the output. See PII anonymizer.

4. Database¶

PostgreSQL by default, SQLite for dev. SQLAlchemy async + Base.metadata.create_all (no Alembic — schema is small).

Main tables:

events — every classified ThreatEvent
investigation_jobs / investigation_reports / investigation_iocs — the AI pipeline output
autonomous_actions — every kill/quarantine/stop/isolate/rollback attempt with revert metadata
banned_ips — track bans, support multi-server
breach_notifications — GDPR Art. 33 with deadline + escalation
compliance_reports — daily audit history
threat_intel — IP reputation cache
memories — persistent state for the conversational AI bot
servers — multi-server registry (federated mode)

5. Surface (dashboard + Telegram)¶

The web dashboard is an aiohttp service on port 8443. It serves:

Live attack map (Leaflet + WebSocket)
Stats by source/category/country/time window
Recent events feed
AI investigations table
Compliance reports (with PDF export)
Network assessment scanner (LAN discovery, vuln scoring)
A /api/ingest endpoint so remote agents can stream events to a central Fenrir

The Telegram bot is two-way:

Push: Fenrir sends alerts (HIGH/CRITICAL events, breach 72h escalation, daily digest at 07:00 UTC).
Pull: you talk to Fenrir. It runs commands on your behalf via tools (check disk, ban IP, restart service, run backup, etc.) with safety gates.

Service & deployment shape¶

Fenrir runs as a single systemd unit:

systemctl status p3guardian

Defaults to running as the unprivileged p3guardian user. Reads logs via group adm membership. Sudo is never required at runtime (sudo paths in code are gated by os.geteuid() == 0 to avoid mail-spam from mail_badpass).

Disk footprint: ~150 MB for the venv, plus database growth (~10 MB/month typical), plus ~3 GB if you install the PII anonymizer model.

Network footprint: outbound HTTPS to OpenRouter (or none if you use Ollama only), Telegram API, and the Cloudflare Tunnel control plane. No inbound ports are required — Cloudflare Tunnel terminates traffic at the edge and forwards locally.

Next: Install →