Deferred Items — Proposed Solutions Catalogue¶

Date: 2026-06-07 (after Kay's feedback that deferred items were being queued without concrete solution proposals) Purpose: For each deferred item from the security audit + working session, capture WHAT it is, WHEN to revisit, WHO touches it, HOW to execute concretely, dependencies, and time. So when we come back to any item, we can execute without re-thinking. Cross-references: security-audit-2026-06-07.md, homelab-tracker.md, MEMORY.md project memories.

Severity column legend¶

🔴 Critical (overdue) | 🟠 High (this month) | 🟡 Medium (this quarter) | 🔵 Hygiene (opportunistic)

Active deferred items¶

C4 — Off-site backup (R2 vs B2 decision) 🔴¶

Trigger to revisit: family onboarding, or LXC 258 usage > 50 GB, or 2026-07-15 (HBA reminder fires)
Proposed solution: Cloudflare R2 (same account as tunnel, free egress, slightly higher storage cost)
Concrete steps (~1 h when triggered):
Cloudflare dashboard → R2 → Create bucket oak-techx-homelab-backup (region: EU)
Create R2 API token: bucket-scoped, read+write, expires never (rotate annually)
On PBS VM 200: apt install rclone; rclone config → S3-compatible → R2 endpoint → paste API key + secret
Sync script: rclone sync /mnt/datastore/apple-tank/.chunks r2:oak-techx-homelab-backup/chunks --transfers 8 --bwlimit 5M --log-file=/var/log/rclone-sync.log
Cron weekly Sunday 04:00
Verify: random chunk fetch + integrity check after first sync
Document in security audit (close §7.1) + homelab-tracker
Dependencies: PBS healthy, Cloudflare account access (already have)
Cost: ~$1.50/mo for 100 GB, free egress
Estimated time: 1 hour

#34 pve-firewall — write rules + enable 🟠¶

Trigger to revisit: dedicated session when Kay has 1-2 hours and willingness to debug if rules block something
Proposed solution: write cluster.fw + host.fw rules, test in TEST mode, then enable
Concrete steps (~1.5 h):
/etc/pve/firewall/cluster.fw:
- [OPTIONS]: enable: 0 (start disabled; flip to 1 after rules vetted)
- [RULES]: deny default IN, allow OUT (PVE outbound unrestricted)
/etc/pve/nodes/arochukwu/host.fw:
- [OPTIONS]: enable: 0 initially; nf_conntrack_max: 524288
- Allow IN from 10.0.10.0/24 (SERVERS): TCP 22, 8006 (web UI), 9100 (node_exporter)
- Allow IN from 172.16.1.0/24 (MGMT): ALL (admin can reach everything)
- Allow IN from 10.0.50.0/24 (DMZ): NOTHING (DMZ should never reach PVE direct)
- Deny IN from WAN
pve-firewall compile — syntax check before enabling
Enable in test/log-only mode for 24 h: [OPTIONS] log_level_in: info to capture what would be blocked
Review logs, fix mis-blocked stuff, then flip enable: 1
Risk: bad rule locks Kay out of PVE web UI + SSH. Mitigation: VLAN 40 direct L2 to 172.16.1.5 is hardwired allow + chassis console as ultimate fallback
Dependencies: none
Estimated time: 1.5 hours including 24-hour observation window

#35 WebAuthn / Passkeys for Authelia 🟡¶

Trigger to revisit: Kay wants Face ID login on iPhone instead of typing TOTP code
Proposed solution: enable WebAuthn provider in Authelia config, enroll iPhone first
Concrete steps (~20 min):
SSH to LXC 254: pct exec 254 -- vi /opt/authelia/config/configuration.yml

Add under top level (peer of totp:):

webauthn:
  disable: false
  display_name: 'Oak Techx Homelab'
  attestation_conveyance_preference: indirect
  user_verification: preferred
  timeout: 60s

Restart container: pct exec 254 -- docker restart authelia
From iPhone Safari, go to https://auth.hm.iamkay.eu/ → log in with password + TOTP → Settings → "Two-Factor Methods" → "Register Security Key" → iOS Face ID prompt → confirms → saved to iCloud Keychain
Repeat enrollment on each device: Windows laptop (Windows Hello), Mac (Touch ID/Face ID), iPad
Test: open Authelia portal in a different browser, password + WebAuthn flow should work
Keep TOTP enrolled as fallback in case Apple keychain ever has issues
Risk: low — additive 2FA method, doesn't replace TOTP
Dependencies: Authelia ≥ 4.36 (have 4.39.20 ✓)
Estimated time: 20 min

#36 M9 — SMTP relay for system mail 🟠¶

Trigger to revisit: anytime in next 2 weeks (PBS backup failure mail doesn't currently deliver — silent failure risk)
Proposed solution: Resend.com (3000 emails/month free, simple API key, no credit card)
Concrete steps (~30 min):
Kay: create Resend.com account (sign up free), verify ownership of iamkay.eu domain (add the DNS records they show — same Cloudflare account)
Resend dashboard → API Keys → Create → restricted to "Send emails" only
On PVE host: apt install libsasl2-modules
Edit /etc/postfix/main.cf:
- relayhost = [smtp.resend.com]:587
- smtp_use_tls = yes
- smtp_sasl_auth_enable = yes
- smtp_sasl_security_options = noanonymous
- smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
- smtp_tls_security_level = encrypt
- smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
Create /etc/postfix/sasl_passwd: [smtp.resend.com]:587 resend:re_xxxxxxxxxxxxx
postmap /etc/postfix/sasl_passwd && chmod 600 /etc/postfix/sasl_passwd*
systemctl restart postfix
Test: echo "PBS test" | mail -s "PBS test" [email protected] → should arrive in Hotmail within 30 s
Trigger an actual PBS backup failure (e.g. temporarily wrong storage name) → verify failure mail arrives
Repeat config on PBS VM 200 separately (its own postfix)
Risk: low — only affects outbound mail
Dependencies: Kay creates Resend account + domain DNS verify (~10 min on his side)
Estimated time: 30 min me + 10 min Kay
Side benefit: enables M5 (Vaultwarden require_device_email), Authelia password-reset flow, Nextcloud share-by-email when family lands

Medium-severity items from audit (not yet formally tasked)¶

M2 — Replace RSA SSH key on PVE root with ed25519 🟡¶

Trigger to revisit: opportunistic — quick win
Proposed solution: identify which RSA key is in /etc/pve/priv/authorized_keys, regenerate as ed25519, swap
Concrete steps (~10 min):
awk '$1=="ssh-rsa"' /etc/pve/priv/authorized_keys to find it
Identify whose laptop has the matching private key (check comment field)
On that machine: ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_pve -C "kay-pve-$(date +%Y)"
ssh-copy-id -i ~/.ssh/id_ed25519_pve.pub [email protected]
Remove RSA line from /etc/pve/priv/authorized_keys
Update ~/.ssh/config on laptop to point at new key for pve host alias
Test SSH from a fresh terminal
Risk: very low
Estimated time: 10 min

M3 — Internal hostname leak via public DNS 🔵¶

Status: ACCEPTED — cosmetic only; hm.iamkay.eu apex DNS resolves to Cloudflare proxy IPs but no service responds (1014 error). Attacker learns we use hm.iamkay.eu naming pattern; no actual exposure.
Future fix (if ever bothered): migrate internal TLD from hm.iamkay.eu to .lan or .home.arpa or unregistered. High effort, low value. Defer indefinitely.

M5 — Vaultwarden `require_device_email` 🟡¶

Trigger to revisit: AFTER M9 (SMTP relay) lands
Proposed solution: edit Vaultwarden config.json, set require_device_email: true, restart container
Dependencies: SMTP working (M9)
Estimated time: 5 min (gated on M9)

M6 — cloudflared TUNNEL_TOKEN as Docker secret 🟡¶

Trigger to revisit: opportunistic security hygiene; not load-bearing
Proposed solution: refactor docker-compose to mount token as file vs env
Concrete steps (~15 min):
Write /opt/cloudflared/cf-tunnel.token (already exists pattern); chmod 600
Edit /opt/cloudflared/docker-compose.yml:
- Add top-level secrets: block with file path
- Container env: TUNNEL_TOKEN_FILE: /run/secrets/cf-tunnel-token
- Container secrets: - cf-tunnel-token
cloudflared image supports TUNNEL_TOKEN_FILE as of v2024+
Restart container
Verify token isn't in docker inspect env anymore
Risk: low, but restart drops tunnel briefly (~5 s)
Estimated time: 15 min

M7 — Cloudflare API token scope review 🔵¶

Trigger to revisit: 6-month rotation cadence
Proposed solution: in Cloudflare dashboard → My Profile → API Tokens → review token cfut_RWYgPHFSqh3Zq8b29DXXJBAk4NKYCIqYB9vs4tFr6034e472 → confirm scope is exactly: Zone:DNS:Edit + Zone:Zone:Read on iamkay.eu only
Concrete steps (~5 min): Kay checks dashboard; rotate if scope creep found
Estimated time: 5 min

M8 — PBS chunk-handler tuning 🟡¶

Trigger to revisit: next time a restore drill hangs the UI (Session 8 finding)
Proposed solution: bump PBS VM 200 memory 4 → 6 GB; investigate --chunk-cache-size tuning
Concrete steps (~15 min):
qm set 200 -memory 6144
Restart VM
Test by mounting a backup snapshot via PBS UI while running a fresh backup — should not freeze
Estimated time: 15 min + restore drill validation

M10 — BIOS P67 update (G7 microcode stale) 🟡¶

Trigger to revisit: next chassis-open (HBA install ~July)
Proposed solution: bundle with SPP 2017.10.1 flash (already staged on PVE ISO storage)
Concrete steps: BIOS update is part of the SPP run; no separate work needed
Dependencies: chassis-open, iLO Virtual Media
Estimated time: bundled with HBA install (~2 hours total for the chassis session)

M11 — GitLab `root` user deprecation 🟡¶

Trigger to revisit: after H10 (you enrolling 2FA on root and promoting yourself to admin)
Proposed solution: either disable root or keep with strong password + locked
Concrete steps (~10 min after H10):
Option A: create a real kay GitLab user with admin role, disable root (locked, password unknown, kept for "what if I lose access" emergency only)
Option B: keep using root, rename to kay, enroll 2FA
Estimated time: 10 min

M12 — Empty stub dirs cleanup on tanks 🔵¶

Trigger to revisit: opportunistic
Proposed solution: rm 5 known-empty dirs after per-target Kay confirmation per CLAUDE.md §0 rule #2
Targets: r-tank/mac-store/, r-tank/proxmox/vms/, s-tank/proxmox/ct-k8s-datastore/, /windows-store/, /R-tank/ (capitalized)
Estimated time: 5 min

M13 — SIEM / Grafana auth-event dashboard 🟡¶

Trigger to revisit: when Path B (learning) curriculum hits observability deep-dive, OR when you want to spend an evening on it
Proposed solution: write a Grafana dashboard with Loki queries
Concrete steps (~60 min):
Identify auth log paths: Authelia container /config/notification.txt, Nextcloud nextcloud.log in container, GitLab production.log, sshd via journald
Promtail config: scrape those paths with labels {service, level, host}
Grafana panels:
- Login attempts per service per hour (graph)
- Failed logins per source IP per hour (table)
- Blocked / banned IPs per hour (gauge)
- Alert: > 10 failed in 5 min → Uptime Kuma webhook → email/iMessage
Dependencies: Loki + Promtail healthy (they are)
Estimated time: 60 min

M14 — Cloudflare WAF + geo policy on cloud.iamkay.eu 🟡¶

Trigger to revisit: when you want to harden the family-cloud public surface
Proposed solution: enable Cloudflare WAF managed rulesets (free) + soft geo policy
Concrete steps (~20 min):
Cloudflare dashboard → cloud.iamkay.eu → Security → WAF → enable "Cloudflare Managed Ruleset" (free) + "OWASP Core Ruleset" (free) at "low" sensitivity
Bot Fight Mode → On
Optional: Page Rules → if request country ∉ NL/DE/US/GB → challenge (Turnstile JS challenge — not block, so family travelers aren't locked out)
Test from incognito + VPN: still loads, but slower TLS handshake during challenge
Risk: false positives on legitimate users — start lenient
Estimated time: 20 min initial + tuning over 2 weeks

Finding-4 — pfSense admin user 2FA via FreeRADIUS+TOTP 🟠¶

Status: Deferred 2026-06-07. FreeRADIUS pfSense package DOWNLOADED but not configured.
Trigger to revisit: any of (a) VLAN 40 MGMT grows beyond Kay's trusted laptop (e.g., G5 cluster nodes land there), (b) Path B curriculum hits "authentication", (c) any pfSense access-log anomaly
Why deferred: current mitigations bound the risk well — pfSense web UI is only reachable from VLAN 40 (3 trusted hosts: laptop, PVE, iLO) and password is strong bcrypt. ~1 h FreeRADIUS setup with real lockout risk doesn't justify the marginal gain tonight.
Proposed solution (when triggered):
Pre-flight: open a SECOND pfSense admin session in a different browser, keep alive throughout — if RADIUS breaks login, the live local-DB session can revert.
System → Package Manager → verify FreeRADIUS is installed (already done).
Services → FreeRADIUS → Settings tab:
- Enable on localhost:1812 (UDP) for auth
- Shared Secret: generate strong random (vault it)
- EAP types: TLS-only or PEAP+MSCHAPv2 (depends on chosen TOTP module)
Services → FreeRADIUS → Users tab → Add user kay:
- Username: kay
- Password: same as pfSense local password (RADIUS layered ON TOP of password)
- "Motp Initial Secret" field (mod_otp): generate a 16-char hex secret → vault it
- Save
System → User Manager → Authentication Servers → +Add:
- Type: RADIUS
- Hostname: 127.0.0.1
- Shared Secret: from step 3
- Auth port: 1812
- Save → Test (should succeed for kay + correct password + correct TOTP code)
System → User Manager → Settings → change "Authentication server" from "Local Database" to the new RADIUS server. SAVE — at this point, your live login is still valid (existing session), but next login requires RADIUS.
Enroll TOTP: install authenticator app for the mod_otp secret. Same authenticator that has Authelia + Nextcloud + GitLab + PVE root.
Test in a third browser (not the kept-alive sessions): login with kay + password — should now prompt for TOTP code on a second screen.
If broken: revert via the kept-alive session in step 1 → System → User Manager → Settings → switch back to Local Database.
Document final config in homelab-tracker + vault the FreeRADIUS shared secret + mod_otp seed.
Dependencies: FreeRADIUS package (✓ installed), authenticator app (have it), strong shared secret
Risk: medium-high — pfSense lockout if RADIUS breaks. Mitigation: kept-alive session in step 1 + chassis console fallback as last resort
Estimated time: 1 hour (45 min setup + 15 min testing buffer)
Cross-references: [[security-audit-2026-06-07.md]] Finding 4; [[learning-tracker.md]] Path B authentication module

SEC1 — Cybersecurity tooling stack (Suricata + IDS/IPS + scanning + SIEM) 🟠¶

Status: NEW deferred entry born from Kay's reminder 2026-06-07 — "remember the suricata and all other cybersecurity stuffs we have to do later"
Trigger to revisit: any of (a) non-Kay users start using homelab (family, clients), (b) Path B curriculum hits "intrusion detection" / "blue team", (c) any incident surfaces a detection gap
Proposed solution: layered cybersecurity stack across 4 tiers. NOT all at once — one tier per session.

Tier 1 — Network (perimeter): - Suricata IDS/IPS on pfSense — pfSense package. Signature-based intrusion detection on WAN + inter-VLAN. Free ET Open ruleset + Talos. Start in alerts-only mode on WAN, then SERVERS interface, then DMZ. Log shipping to Loki for retention beyond pfSense's small SG-1100 disk. - pfBlockerNG on pfSense — block bad IP ranges (DROP/EDROP/Spamhaus), GeoIP block (known C2 countries), DNSBL (lower priority since Pi-hole already does this). Start with IPv4 lists + GeoIP. - WAN-side rate limiting + connection tracking — anti-SYN-flood / slow-loris fingerprinting.

Tier 2 — Host (per-LXC/VM): - Wazuh agent everywhere — open-source HIDS. FIM, config drift, log anomalies, kernel events, rootkit detection, ATT&CK mapping. Wazuh Manager in a new LXC (~2 GB RAM). Agents on PVE host, PBS, VM 189 homenas, all LXCs. - OSQuery (alternative/complement) — lighter, SQL-based endpoint queries. - Lynis — quarterly Linux hardening audit on each host. Score remediation backlog.

Tier 3 — Application (containers + web): - Trivy container image scanner — integrate with GitLab Runner CI: scan-on-pull, block CVSS≥9. Updates daily. - Clair alternative (slower, more mature). - OWASP ZAP / Nikto — quarterly web vuln scans against Nextcloud, GitLab, Vaultwarden, Pi-hole. Don't auto-fire (false positive risk).

Tier 4 — SOC / SIEM: - Wazuh dashboard doubles as SOC view + Loki integration for log correlation. - TheHive + Cortex — incident response platform; overkill for solo but excellent Path B learning material. - MISP — threat intel aggregation; integrates with Suricata + Wazuh.

Implementation order (one tier per session): 1. Tier 1 Suricata on pfSense — biggest visibility / least effort, ~1 hr 2. Tier 1 pfBlockerNG GeoIP + DROP, ~30 min 3. Tier 2 Wazuh manager LXC + first agent (PVE host), ~2 hrs 4. Tier 2 Wazuh agents rolled out to all LXCs/VMs, ~1 hr 5. Tier 3 Trivy in GitLab CI, ~30 min 6. Lynis quarterly schedule, ~15 min one-time 7. Tier 4 Wazuh dashboard tuning + alert routing via M9 SMTP, ~1 hr

Dependencies: - M9 SMTP (✓ done) — for Wazuh alert email routing - Storage headroom (Wazuh manager ~50-100 GB for indexer) - Loki for Suricata log shipping (✓ already deployed in monitoring LXC)

Cross-references: - [[security-audit-2026-06-07.md]] — current posture baseline - [[learning-tracker.md]] Path B — blue team curriculum overlap - [[homelab-tracker.md]] Phase 6 — G5 cluster could host Wazuh manager when migrated

#34F — pve-firewall rule tightening (after broader deployment) 🔵¶

Status: NEW deferred item born from pve-firewall enable 2026-06-07. Current rules are permissive baseline; this captures further tightening.
Trigger to revisit: any of (a) 2nd admin user added to MGMT VLAN 40, (b) a Path B learning session on Linux host firewalls, (c) a security incident or scan result that highlights internal lateral movement
Proposed solution: tighten the two permissive groups in /etc/pve/firewall/cluster.fw IPSETs:
Tighten trusted_mgmt: from 172.16.1.0/24 → just 172.16.1.100/32 (your laptop only). Reasoning: today VLAN 40 has only your laptop + PVE + iLO, but a future device on MGMT shouldn't get auto-trust. The pattern of "list specific admin IPs" beats "trust the whole subnet" as soon as the subnet grows.
Tighten trusted_servers: split into role-based rules instead of one /24. E.g., only 10.0.10.14/32 (monitoring) can hit port 9100; only 10.0.10.10/32 (Traefik) can hit port 8006 if it actually does; SSH from VLAN 10 stays at /24 (too many service containers might legitimately ssh in admin scenarios). Reasoning: any compromised LXC in VLAN 10 currently can probe PVE on SSH/web — narrowing to specific service IPs reduces the lateral surface.
Concrete steps (~20 min):
pct exec 256 -- ip addr to confirm monitoring IP is 10.0.10.14

Edit /etc/pve/firewall/cluster.fw:

[IPSET trusted_mgmt]
172.16.1.100/32

[IPSET prometheus_scrapers]
10.0.10.14/32

Edit /etc/pve/nodes/arochukwu/host.fw:
- Change +dc/trusted_servers -p tcp -dport 9100 → +dc/prometheus_scrapers -p tcp -dport 9100
- Optionally restrict 8006 too if no other LXC needs PVE API access
pve-firewall compile to syntax check
pve-firewall reload
Verify Prometheus still scrapes (HTTP 200 on /metrics from inside LXC 256)
Verify your laptop still has full PVE access from 172.16.1.100
Risk: medium — if I scope trusted_mgmt to only laptop's static IP and your laptop's IP ever changes (DHCP renewal that shifts you off 172.16.1.100, or you replace the laptop), you're locked out of PVE web UI via MGMT path. Fallback: VLAN 10 SSH (still works since LXCs are in trusted_servers).
Mitigation: only tighten AFTER setting laptop's MGMT IP to static reservation in pfSense DHCP
Estimated time: 20 min for the rule edit + 10 min for DHCP static reservation + verification window
Dependencies: static IP reservation for laptop on MGMT
Cross-reference: extends task #34 (now closed)

Completed audit follow-ups (kept here as audit trail)¶

Item	What	Closed
M1	Stale `pve-enterprise.list.dpkg-old` apt sources file removed	2026-06-07
#33	GRUB cmdline `nomodeset vga=794` staged for kernel 6.8.12 G7 boot	2026-06-07
M4	HSTS + security-headers on every Traefik router (closed in H7 sweep)	2026-06-07
#34	pve-firewall enabled with permissive baseline (cluster.fw + host.fw, fail2ban-compatible)	2026-06-07
C4	Off-site backup APPROVED deferred to July (single-user scope, R2 plan ready)	2026-06-07

How to keep this doc current¶

Every audit item triggers either: (a) immediate close, (b) deferred entry with concrete plan
Every "let me defer that" remark in conversation triggers a section here
When an item gets revisited, move to "Completed" section with date
Quarterly review: drop items that are still "Future fix if ever bothered" status

Cross-references: - security-audit-2026-06-07.md §12 priority list (original) - homelab-tracker.md Phase 3+ status - MEMORY.md project memories for VPS research, off-site backup, Grafana dashboards