Skip to content

Hardening checklist

This page is the first thing to read before going live. It consolidates the production-grade values that override the defaults shipped for development convenience. Every item here is a single environment variable or one-line change.

Defaults are for development, not internet-facing production

The values shipped in .env are picked to make a fresh install work in 15 minutes on a single host. Several of them are intentionally permissive — Fenrir trusts the operator to harden before exposing the box.

Pre-flight (before first start)

1. IP whitelist — change before the first event

The default IP_WHITELIST includes 192.168.1.0/24 (a typical home LAN). On a real production host:

  • Restrict to your operator IPs and reverse-proxy origin only.
  • Never include 0.0.0.0/0 or wide ranges.
  • If you front Fenrir with Cloudflare/nginx, the whitelist applies to the X-Forwarded-For/CF-Connecting-IP upstream value — verify your reverse-proxy header forwarding before relying on it.
IP_WHITELIST=["10.0.7.5/32","203.0.113.42/32"]

2. Database not on the default user/password

If you use the bundled SQLite default, fine for one-host dev — but for any persistent or multi-tenant setup, configure PostgreSQL with a strong dedicated role:

DATABASE_URL=postgresql://p3g_prod:<long-random>@localhost/p3guardian_prod

3. Telegram chat ID restricted

TELEGRAM_ALLOWED_USERS must contain only the operator IDs that may issue commands. Verify each ID matches a real human you trust.

4. API_SECRET set and rotated

If you run the multi-server / /api/ingest ingestion endpoint, set a long random API_SECRET and rotate it on a schedule.

API_SECRET=$(openssl rand -hex 32)

5. Dashboard exposure

The dashboard listens on 127.0.0.1:8443 by default. Don't expose it publicly without:

  • HTTPS via Cloudflare Tunnel or nginx/caddy with a real cert
  • HTTP basic auth on top (see deploy/fenrirsoc/ for an example) — or, better, Cloudflare Zero Trust Access with SSO/MFA

6. Filesystem permissions

chown root:p3guardian /opt/p3guardian/.env
chmod 0640 /opt/p3guardian/.env

The service user should not own .env. The DB password and Telegram token live there.

Detection tuning

The defaults are tuned for "noisy environment, low false-positive tolerance". Tighten for production:

7. Confidence threshold for auto-actions

Default in code: low enough to act on most confirmed threats. For environments where banning a wrong IP is very costly (you don't control the customer's IP space, or one false ban means a phone call from a CXO):

# p3guardian/ai/analyst.py
MIN_CONFIDENCE_FOR_AUTO_ACTION = 90  # was lower, raise it

8. Investigation dedupe window

Increasing the dedupe window reduces analyst-storm risk during sustained attacks. Decreasing it makes you re-investigate too often. Pick based on your traffic profile.

# p3guardian/ai/dispatcher.py — tune DEDUPE_SECONDS

9. Concurrency cap on investigations

Same file (dispatcher.py). The default is small — under sustained spike of HIGH events the queue grows. Raise this only if your LLM budget and host CPU can absorb the burst, otherwise an attacker who can produce many HIGH events from different IPs can keep your analyst pinned while a real attack passes through with deterministic-only classification.

10. Brute-force window

The auth-monitor brute-force detector is sliding-window. Raise the threshold or shorten the window (p3guardian/analyzers/rule_engine.py) if you have legitimate users that occasionally fail authentication multiple times (kiosk, IoT, embedded clients). Lower the threshold for hardened environments.

11. Periodic monitor cadence

baseline_monitor and cve_monitor run on timers. Defaults favor responsiveness over CPU. Lengthen if you run on small hardware or have tight maintenance windows.

# p3guardian/app.py
BaselineMonitor("baseline", self.event_queue, interval=...)
CVEMonitor("cve", self.event_queue, interval=...)

AI / LLM hardening

12. Don't run a tiny local model on internet-facing boxes

The bundled Ollama default model is intentionally small for dev convenience. It is not adequate for production analyst duties on a host that processes attacker-controlled input (logs, requests). A small model is more vulnerable to prompt-injection embedded in log lines.

Pick one of these for production:

  • A local model ≥ 7 B parameters (Qwen 7B+, Llama 3.1 8B+, Mistral 7B+)
  • A frontier model via OpenRouter — for an SMB the cheapest production-grade option is anthropic/claude-haiku-4-7 or qwen/qwen-flash-2.5. For audit-grade reasoning on critical events, anthropic/claude-opus-4-7.
OPENROUTER_API_KEY=sk-or-v1-...
OPENROUTER_MODEL=anthropic/claude-haiku-4-7

13. PII anonymizer must be installed if using cloud LLM

The anonymizer is optional (Fenrir falls back to raw prompts with a warning). For any cloud LLM provider, install it before going live:

sudo -u p3guardian /opt/p3guardian/.venv/bin/pip install opf

This is what lets you defend the GDPR Art. 32 / Art. 28 sub-processor argument with auditors.

Operational

14. Log rotation visible to Fenrir

Fenrir's LogTailer survives standard logrotate (it detects inode change and reopens). But verify after deploy that nginx-honeypot.log, auth.log, fail2ban.log are tailed correctly — ban a synthetic IP and confirm an event lands in events. The bug we fixed in 0.5.x stuck the tailer on quiet log files; a regression here is silent.

15. Time sync

GDPR-33 has a 72h deadline. The deadline tracker uses local clock. Keep chrony / systemd-timesyncd running and pointed at trustworthy NTP.

16. Backups of the database

Fenrir does not back up its own DB. The events, investigation_*, and breach_notifications tables are your audit trail. Lose them and lose the evidence.

# /etc/cron.d/p3guardian-backup
0 3 * * *  postgres  pg_dump p3guardian | zstd -9 > /backup/p3g-$(date +\%Y\%m\%d).sql.zst

Encrypt the backup destination separately.

17. fail2ban action defaults

The bundled nginx-honeypot.conf for fail2ban uses action_mwl by default — that sends a Telegram-style mail on every ban, which is noisy. Switch to action_ (just ban, no mail) once you trust the detection.

# /etc/fail2ban/jail.d/nginx-honeypot.conf
action = %(action_)s

Periodic review (every quarter)

  • Re-baseline baseline_monitor after every legitimate change wave (delete data/baseline.json, restart)
  • Re-baseline AIDE the same way after planned deploys
  • Rotate OPENROUTER_API_KEY and TELEGRAM_BOT_TOKEN
  • Review ip_whitelist for stale entries (ex-employees, decommissioned offices)
  • Test a synthetic attack: hit a honeypot URL from a non-whitelisted external IP and confirm an investigation lands in DB within 60 s
  • Restore-test the DB backup
  • Audit the journal for any monitor that has produced zero events in 30 days — likely dead due to a parser regression, not a quiet system

What this checklist intentionally does NOT include

Specific numeric thresholds for detection (how many failed logins, how many seconds of dedupe, how many concurrent investigations). Those values are deliberately tunable in code — pick them for your environment, in private. Publishing them on a website is a checklist for an attacker on how to slip under your bar.

If you need help tuning, contact us.