Troubleshooting¶

Common problems and the first thing to check. If your problem isn't here, open an issue on GitHub.

Service won't start¶

sudo systemctl status p3guardian
sudo journalctl -u p3guardian -n 50 --no-pager

Look at the bottom of the journal for the actual exception. Most common causes:

Database connection refused → DATABASE_URL is wrong, or PostgreSQL isn't running. Test with psql -h localhost -U p3guardian -d p3guardian.
Telegram bot token rejected → token is wrong or revoked. Test with: curl https://api.telegram.org/bot<TOKEN>/getMe.
Permission denied on log file → p3guardian user isn't in adm group. Fix: sudo usermod -a -G adm p3guardian && sudo systemctl restart p3guardian.
Missing Python dep → run sudo -u p3guardian /opt/p3guardian/.venv/bin/pip install -e /opt/p3guardian to refresh.

No events appearing in the database¶

SELECT source, COUNT(*) FROM events
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY source;

If a specific source has zero rows but you expect activity:

Check the file is growing. sudo tail -f /var/log/<file> from another terminal — does it scroll?
Check the monitor is registered. journalctl -u p3guardian | grep "Monitor started" — is the source in the list?
Check log file permissions. ls -la /var/log/<file> — readable by the adm group?
Known issue: stale FD after logrotate. If a log file was rotated while Fenrir held an open fd, the tailer can get stuck reading the deleted file. Symptom: ls -l /proc/$(pidof python | head -1)/fd/ | grep <file> shows an FD pointing to a different inode than the on-disk file. Fix: restart p3guardian.service. Long-term fix landed in 0.5.x (LogTailer does periodic inode checks).

AI investigations not firing¶

Symptoms: HIGH events appear in events, but investigation_jobs stays empty.

Check:

Severity threshold: only HIGH and CRITICAL trigger. MEDIUM does not. Normal.
Dedupe window: if the same (category, ip) was investigated in the last 60 minutes, subsequent events are deduped. SELECT * FROM investigation_jobs WHERE category=... ORDER BY created_at DESC to confirm.
Concurrency cap: if 2 investigations are already running, new ones queue up. journalctl -u p3guardian | grep ANALYST shows starts/dones.
AI backend reachable: curl -s https://openrouter.ai/api/v1/models -H "Authorization: Bearer $OPENROUTER_API_KEY" should return JSON with the model list.

Telegram alerts not arriving¶

# 1. Bot is up
curl https://api.telegram.org/bot<TOKEN>/getMe

# 2. Chat ID is right
# Send /start to the bot from the target chat, then:
curl https://api.telegram.org/bot<TOKEN>/getUpdates | grep chat
# the "id" in the chat object is your TELEGRAM_CHAT_ID

# 3. Bot is allowed in the chat
# If it's a group, the bot must be added as a member.
# If it's a channel, the bot must be admin.

# 4. Server can reach Telegram
curl -I https://api.telegram.org/

Restart p3guardian after fixing.

Dashboard is empty / 502¶

# Is the service running?
systemctl is-active p3guardian

# Is it bound to the expected port?
sudo ss -tlnp | grep 8443

# Is nginx reverse-proxy talking to the right backend?
sudo nginx -T 2>/dev/null | grep -A 3 'proxy_pass.*8443'

# Can you reach it locally?
curl -s http://127.0.0.1:8443/api/stats | python3 -m json.tool

If everything looks right but the public URL still 502s, check cloudflared:

sudo systemctl status cloudflared
sudo journalctl -u cloudflared -n 30

"I'm getting too many emails from root"¶

Not an alert problem — a system mail problem. fail2ban and sudo can spam root@ (forwarded by /etc/aliases). To silence:

# Stop fail2ban from emailing on every ban (in jail.d/*.conf):
sudo sed -i 's|action_mwl|action_|g; s|action_mw|action_|g' /etc/fail2ban/jail.d/*.conf
sudo systemctl reload fail2ban

# Stop sudo from emailing on bad password:
echo 'Defaults !mail_badpass' | sudo tee /etc/sudoers.d/99-no-mail-badpass
sudo chmod 0440 /etc/sudoers.d/99-no-mail-badpass
sudo visudo -cf /etc/sudoers.d/99-no-mail-badpass

Real Fenrir alerts go via Telegram, not email. The email channel only carries genuine system errors (cron failures, disk full, kernel panic).

"PII anonymizer not loading"¶

ImportError: openai-privacy-filter not installed.

It's optional. Either install it:

sudo -u p3guardian /opt/p3guardian/.venv/bin/pip install opf

Or accept that cloud LLM calls send raw prompts (you'll see a one-time warning in the journal). For development, fine. For production with cloud LLMs, install it.

Compliance checks all "fail" or "partial"¶

Run the compliance audit interactively to see why:

curl -s "http://127.0.0.1:8443/api/compliance/run?framework=GDPR" \
  | python3 -c "import json,sys; [print(c['control_id'], c['status'], c.get('details','')) for c in json.load(sys.stdin)['controls']]"

For each control with status=fail, look at evidence — it tells you which sub-check failed.

Common false-failures:

GDPR-5.1.f firewall_active=false when UFW is actually running → you're on a Fenrir version older than 0.5.x that didn't fall back to systemctl is-active. Update.
GDPR-5.1.f ssl_configured_sites=0 → you terminate TLS at Cloudflare upstream, not in nginx. 0.5.x+ accepts this and counts cloudflared as valid. Update.
GDPR-33 fail with 0 actual breaches → an old breach record in the DB has status=open with deadline passed. Either close it (UPDATE breach_notifications SET status='closed', closed_at=NOW() WHERE id=...) or set BREACH_NOTIFICATION_EMAIL to provide the missing evidence.

Investigation reports show `verdict=inconclusive`¶

The agent couldn't reach a confident decision in 5 tool rounds. Common reasons:

Insufficient log data. The event source's logs were rotated or empty when the agent looked. Check that retention is reasonable (we don't recommend < 7 days of auth.log).
Cloud LLM rate-limited. Look for HTTP 429 errors in the journal during the investigation window. Solution: use a different OpenRouter model or a dedicated API key.
Playbook is too narrow. If a category fires often but always inconclusive, the playbook is asking for impossible signals. Edit the playbook to widen it.

Server load is too high¶

Fenrir's idle CPU is < 1%. RAM is ~200 MB. If you're seeing higher:

# What's hot?
sudo top -p $(pidof python | head -1)

# How many tool processes are running?
sudo pgrep -af 'p3guardian|grep|find' | head

If you see a find /usr /bin /sbin /opt -perm -4000 running for minutes, that's the baseline_monitor's setuid scan on a slow disk. Bump the interval in app.py from 600s to 1800s.

If you see many simultaneous Python subprocesses spawning, the AI tool-loop may be in a tight loop. Check the dispatcher concurrency cap (default 2) is in effect.

Resetting the baseline¶

If you've intentionally added many users / services / ports and want Fenrir to accept the new state as normal:

sudo -u p3guardian rm /opt/p3guardian/data/baseline.json
sudo systemctl restart p3guardian

Next baseline run (within 10 min) will reseed the snapshot. No alerts in the meantime.

Still stuck¶

sudo journalctl -u p3guardian --since '1 hour ago' --no-pager > /tmp/fenrir.log

Open an issue with /tmp/fenrir.log attached (redact any sensitive content): https://github.com/P3consultingtech/p3guardian/issues