Architecture¶
Fenrir is one process. Everything else is just glue.
The 60-second mental model¶
flowchart LR
L[System logs<br/>auth.log, nginx, fail2ban,<br/>kern.log, dpkg.log, ...] -->|tail -f| M
K[Periodic snapshots<br/>baseline, CVE feed] -->|cron| M
M[9 Monitors] -->|RawEvent| Q[(Event queue<br/>asyncio)]
Q --> RE[Rule Engine<br/>classify]
RE -->|ThreatEvent| DB[(PostgreSQL<br/>or SQLite)]
RE -->|HIGH/CRITICAL| DISP[Investigation<br/>Dispatcher]
DISP -->|spawn| AA[AI Analyst<br/>+ playbook]
AA -->|read| TOOLS[Tools<br/>shell, geoip, threat intel]
AA -->|cloud LLM call| ANON[PII Anonymizer]
ANON -->|tokenized| LLM[(OpenRouter<br/>or Ollama)]
AA -->|persist| DB
AA -->|low-risk auto| RESP[Responders<br/>fail2ban ban]
AA -->|high-confidence| AUTO[Autonomous<br/>kill / quarantine /<br/>stop / isolate / rollback]
AUTO -->|persist + revert button| DB
AUTO -->|notify| TG[Telegram bot]
RE -->|alert| TG
AA -->|verdict report| TG
DB --> WEB[Web dashboard<br/>aiohttp]
The 5 components¶
1. Monitors¶
Nine small classes, each watching one thing:
| Monitor | Source | What it sees |
|---|---|---|
auth_monitor |
/var/log/auth.log |
SSH login attempts (success, failure, brute force) |
honeypot_monitor |
/var/log/nginx/honeypot.log |
Hits on fake admin pages (/wp-admin, /.env, ...) |
fail2ban_monitor |
/var/log/fail2ban.log |
Bans, unbans, repeat offenders |
nginx_monitor |
/var/log/nginx/access.log |
Suspicious patterns in real traffic |
ufw_monitor |
/var/log/ufw.log |
Firewall drops, port scans |
kernel_monitor |
/var/log/kern.log |
USB connect/disconnect, OOM, segfault, AppArmor denials |
package_monitor |
/var/log/dpkg.log |
Package install/upgrade/remove |
baseline_monitor |
periodic (10 min) | Drift in listening ports, services, users, setuid bins |
cve_monitor |
periodic (6 h) | Pending security upgrades (apt list --upgradable) |
Six are log tailers (tail-based, byte-streaming). Three are periodic snapshotters (run on a timer).
All produce RawEvent objects and push them onto a single asyncio.Queue.
2. Rule engine¶
A pure-Python classifier. Takes a RawEvent, returns a ThreatEvent:
@dataclass
class ThreatEvent:
id: str
timestamp: datetime
source: str # "auth", "honeypot", "package", ...
severity: Severity # INFO / LOW / MEDIUM / HIGH / CRITICAL
category: str # "ssh_brute_force", "honeypot_hit", ...
ip: Optional[str]
description: str
confidence: int # 0-100
raw_line: str
# ... (geoip, ai analysis, country, etc.)
The rule engine is deterministic and fast. It does not call an LLM. The LLM only enters the picture during AI investigations (next step).
The rule engine can also escalate based on history (e.g. an IP that was banned twice in 24h gets CRITICAL even on its third honeypot hit).
3. AI Investigation pipeline¶
When a ThreatEvent has severity >= HIGH, the dispatcher:
- Checks dedupe cache (skip if the same
(category, ip)was investigated recently). - Checks concurrency cap to keep LLM cost and CPU bounded.
- Spawns an
AnalystAgenttask.
Dedupe window and concurrency cap are tunable defaults — production values live in the Hardening checklist.
The agent:
- Loads the playbook for that event category from
p3guardian/ai/playbooks/<category>.md. - Builds a system prompt = base analyst persona + playbook + event details.
- Runs the LLM tool-loop (max 5 rounds): the LLM picks tools (shell, log search, geoip, threat intel), Fenrir executes them, results feed back into the prompt.
- Parses a structured
final_reportJSON from the LLM's last response. - Persists three rows:
InvestigationJob,InvestigationReport, zero or moreInvestigationIOC. - Optionally executes low-risk auto-actions (
ban_ipviafail2ban-client, whenverdict=confirmed_threatandconfidence >= 80). - Optionally dispatches autonomous response actions (
kill_process,file_quarantine,service_stop,isolate_network,package_rollback) to the autonomous responder. Off by default (AUTO_ACTION_ENABLED=false); when enabled, defaults toAUTO_ACTION_DRY_RUN=truefor at least one week. Every attempt is persisted with revert metadata; the operator reverts via Telegram inline button or web UI. - Sends a Telegram alert with the verdict + summary.
If OPENROUTER_API_KEY is set, the LLM is OpenRouter (cloud). Otherwise, fallback to local Ollama.
When OpenRouter is used, the PII anonymizer wraps each prompt — replaces real values with placeholder tokens before sending, restores them in the output. See PII anonymizer.
4. Database¶
PostgreSQL by default, SQLite for dev. SQLAlchemy async + Base.metadata.create_all (no Alembic — schema is small).
Main tables:
events— every classifiedThreatEventinvestigation_jobs/investigation_reports/investigation_iocs— the AI pipeline outputautonomous_actions— every kill/quarantine/stop/isolate/rollback attempt with revert metadatabanned_ips— track bans, support multi-serverbreach_notifications— GDPR Art. 33 with deadline + escalationcompliance_reports— daily audit historythreat_intel— IP reputation cachememories— persistent state for the conversational AI botservers— multi-server registry (federated mode)
5. Surface (dashboard + Telegram)¶
The web dashboard is an aiohttp service on port 8443. It serves:
- Live attack map (Leaflet + WebSocket)
- Stats by source/category/country/time window
- Recent events feed
- AI investigations table
- Compliance reports (with PDF export)
- Network assessment scanner (LAN discovery, vuln scoring)
- A
/api/ingestendpoint so remote agents can stream events to a central Fenrir
The Telegram bot is two-way:
- Push: Fenrir sends alerts (HIGH/CRITICAL events, breach 72h escalation, daily digest at 07:00 UTC).
- Pull: you talk to Fenrir. It runs commands on your behalf via tools (check disk, ban IP, restart service, run backup, etc.) with safety gates.
Service & deployment shape¶
Fenrir runs as a single systemd unit:
Defaults to running as the unprivileged p3guardian user. Reads logs via group adm membership. Sudo is never required at runtime (sudo paths in code are gated by os.geteuid() == 0 to avoid mail-spam from mail_badpass).
Disk footprint: ~150 MB for the venv, plus database growth (~10 MB/month typical), plus ~3 GB if you install the PII anonymizer model.
Network footprint: outbound HTTPS to OpenRouter (or none if you use Ollama only), Telegram API, and the Cloudflare Tunnel control plane. No inbound ports are required — Cloudflare Tunnel terminates traffic at the edge and forwards locally.
Next: Install →