Skip to content

Architecture

Fenrir is one process. Everything else is just glue.

The 60-second mental model

flowchart LR
    L[System logs<br/>auth.log, nginx, fail2ban,<br/>kern.log, dpkg.log, ...] -->|tail -f| M
    K[Periodic snapshots<br/>baseline, CVE feed] -->|cron| M
    M[9 Monitors] -->|RawEvent| Q[(Event queue<br/>asyncio)]
    Q --> RE[Rule Engine<br/>classify]
    RE -->|ThreatEvent| DB[(PostgreSQL<br/>or SQLite)]
    RE -->|HIGH/CRITICAL| DISP[Investigation<br/>Dispatcher]
    DISP -->|spawn| AA[AI Analyst<br/>+ playbook]
    AA -->|read| TOOLS[Tools<br/>shell, geoip, threat intel]
    AA -->|cloud LLM call| ANON[PII Anonymizer]
    ANON -->|tokenized| LLM[(OpenRouter<br/>or Ollama)]
    AA -->|persist| DB
    AA -->|low-risk auto| RESP[Responders<br/>fail2ban ban]
    AA -->|high-confidence| AUTO[Autonomous<br/>kill / quarantine /<br/>stop / isolate / rollback]
    AUTO -->|persist + revert button| DB
    AUTO -->|notify| TG[Telegram bot]
    RE -->|alert| TG
    AA -->|verdict report| TG
    DB --> WEB[Web dashboard<br/>aiohttp]

The 5 components

1. Monitors

Nine small classes, each watching one thing:

Monitor Source What it sees
auth_monitor /var/log/auth.log SSH login attempts (success, failure, brute force)
honeypot_monitor /var/log/nginx/honeypot.log Hits on fake admin pages (/wp-admin, /.env, ...)
fail2ban_monitor /var/log/fail2ban.log Bans, unbans, repeat offenders
nginx_monitor /var/log/nginx/access.log Suspicious patterns in real traffic
ufw_monitor /var/log/ufw.log Firewall drops, port scans
kernel_monitor /var/log/kern.log USB connect/disconnect, OOM, segfault, AppArmor denials
package_monitor /var/log/dpkg.log Package install/upgrade/remove
baseline_monitor periodic (10 min) Drift in listening ports, services, users, setuid bins
cve_monitor periodic (6 h) Pending security upgrades (apt list --upgradable)

Six are log tailers (tail-based, byte-streaming). Three are periodic snapshotters (run on a timer).

All produce RawEvent objects and push them onto a single asyncio.Queue.

2. Rule engine

A pure-Python classifier. Takes a RawEvent, returns a ThreatEvent:

@dataclass
class ThreatEvent:
    id: str
    timestamp: datetime
    source: str              # "auth", "honeypot", "package", ...
    severity: Severity       # INFO / LOW / MEDIUM / HIGH / CRITICAL
    category: str            # "ssh_brute_force", "honeypot_hit", ...
    ip: Optional[str]
    description: str
    confidence: int          # 0-100
    raw_line: str
    # ... (geoip, ai analysis, country, etc.)

The rule engine is deterministic and fast. It does not call an LLM. The LLM only enters the picture during AI investigations (next step).

The rule engine can also escalate based on history (e.g. an IP that was banned twice in 24h gets CRITICAL even on its third honeypot hit).

3. AI Investigation pipeline

When a ThreatEvent has severity >= HIGH, the dispatcher:

  1. Checks dedupe cache (skip if the same (category, ip) was investigated recently).
  2. Checks concurrency cap to keep LLM cost and CPU bounded.
  3. Spawns an AnalystAgent task.

Dedupe window and concurrency cap are tunable defaults — production values live in the Hardening checklist.

The agent:

  1. Loads the playbook for that event category from p3guardian/ai/playbooks/<category>.md.
  2. Builds a system prompt = base analyst persona + playbook + event details.
  3. Runs the LLM tool-loop (max 5 rounds): the LLM picks tools (shell, log search, geoip, threat intel), Fenrir executes them, results feed back into the prompt.
  4. Parses a structured final_report JSON from the LLM's last response.
  5. Persists three rows: InvestigationJob, InvestigationReport, zero or more InvestigationIOC.
  6. Optionally executes low-risk auto-actions (ban_ip via fail2ban-client, when verdict=confirmed_threat and confidence >= 80).
  7. Optionally dispatches autonomous response actions (kill_process, file_quarantine, service_stop, isolate_network, package_rollback) to the autonomous responder. Off by default (AUTO_ACTION_ENABLED=false); when enabled, defaults to AUTO_ACTION_DRY_RUN=true for at least one week. Every attempt is persisted with revert metadata; the operator reverts via Telegram inline button or web UI.
  8. Sends a Telegram alert with the verdict + summary.

If OPENROUTER_API_KEY is set, the LLM is OpenRouter (cloud). Otherwise, fallback to local Ollama.

When OpenRouter is used, the PII anonymizer wraps each prompt — replaces real values with placeholder tokens before sending, restores them in the output. See PII anonymizer.

4. Database

PostgreSQL by default, SQLite for dev. SQLAlchemy async + Base.metadata.create_all (no Alembic — schema is small).

Main tables:

  • events — every classified ThreatEvent
  • investigation_jobs / investigation_reports / investigation_iocs — the AI pipeline output
  • autonomous_actions — every kill/quarantine/stop/isolate/rollback attempt with revert metadata
  • banned_ips — track bans, support multi-server
  • breach_notifications — GDPR Art. 33 with deadline + escalation
  • compliance_reports — daily audit history
  • threat_intel — IP reputation cache
  • memories — persistent state for the conversational AI bot
  • servers — multi-server registry (federated mode)

5. Surface (dashboard + Telegram)

The web dashboard is an aiohttp service on port 8443. It serves:

  • Live attack map (Leaflet + WebSocket)
  • Stats by source/category/country/time window
  • Recent events feed
  • AI investigations table
  • Compliance reports (with PDF export)
  • Network assessment scanner (LAN discovery, vuln scoring)
  • A /api/ingest endpoint so remote agents can stream events to a central Fenrir

The Telegram bot is two-way:

  • Push: Fenrir sends alerts (HIGH/CRITICAL events, breach 72h escalation, daily digest at 07:00 UTC).
  • Pull: you talk to Fenrir. It runs commands on your behalf via tools (check disk, ban IP, restart service, run backup, etc.) with safety gates.

Service & deployment shape

Fenrir runs as a single systemd unit:

systemctl status p3guardian

Defaults to running as the unprivileged p3guardian user. Reads logs via group adm membership. Sudo is never required at runtime (sudo paths in code are gated by os.geteuid() == 0 to avoid mail-spam from mail_badpass).

Disk footprint: ~150 MB for the venv, plus database growth (~10 MB/month typical), plus ~3 GB if you install the PII anonymizer model.

Network footprint: outbound HTTPS to OpenRouter (or none if you use Ollama only), Telegram API, and the Cloudflare Tunnel control plane. No inbound ports are required — Cloudflare Tunnel terminates traffic at the edge and forwards locally.

Next: Install →