Deferred Items โ Proposed Solutions Catalogue¶
Date: 2026-06-07 (after Kay's feedback that deferred items were being queued without concrete solution proposals)
Purpose: For each deferred item from the security audit + working session, capture WHAT it is, WHEN to revisit, WHO touches it, HOW to execute concretely, dependencies, and time. So when we come back to any item, we can execute without re-thinking.
Cross-references: security-audit-2026-06-07.md, homelab-tracker.md, MEMORY.md project memories.
Severity column legend¶
๐ด Critical (overdue) | ๐ High (this month) | ๐ก Medium (this quarter) | ๐ต Hygiene (opportunistic)
Active deferred items¶
C4 โ Off-site backup (R2 vs B2 decision) ๐ด¶
- Trigger to revisit: family onboarding, or LXC 258 usage > 50 GB, or 2026-07-15 (HBA reminder fires)
- Proposed solution: Cloudflare R2 (same account as tunnel, free egress, slightly higher storage cost)
- Concrete steps (~1 h when triggered):
- Cloudflare dashboard โ R2 โ Create bucket
oak-techx-homelab-backup(region: EU) - Create R2 API token: bucket-scoped, read+write, expires never (rotate annually)
- On PBS VM 200:
apt install rclone;rclone configโ S3-compatible โ R2 endpoint โ paste API key + secret - Sync script:
rclone sync /mnt/datastore/apple-tank/.chunks r2:oak-techx-homelab-backup/chunks --transfers 8 --bwlimit 5M --log-file=/var/log/rclone-sync.log - Cron weekly Sunday 04:00
- Verify: random chunk fetch + integrity check after first sync
- Document in security audit (close ยง7.1) + homelab-tracker
- Dependencies: PBS healthy, Cloudflare account access (already have)
- Cost: ~$1.50/mo for 100 GB, free egress
- Estimated time: 1 hour
#34 pve-firewall โ write rules + enable ๐ ¶
- Trigger to revisit: dedicated session when Kay has 1-2 hours and willingness to debug if rules block something
- Proposed solution: write cluster.fw + host.fw rules, test in TEST mode, then enable
- Concrete steps (~1.5 h):
/etc/pve/firewall/cluster.fw:[OPTIONS]:enable: 0(start disabled; flip to 1 after rules vetted)[RULES]: deny defaultIN, allowOUT(PVE outbound unrestricted)
/etc/pve/nodes/arochukwu/host.fw:[OPTIONS]:enable: 0initially;nf_conntrack_max: 524288- Allow IN from 10.0.10.0/24 (SERVERS): TCP 22, 8006 (web UI), 9100 (node_exporter)
- Allow IN from 172.16.1.0/24 (MGMT): ALL (admin can reach everything)
- Allow IN from 10.0.50.0/24 (DMZ): NOTHING (DMZ should never reach PVE direct)
- Deny IN from WAN
pve-firewall compileโ syntax check before enabling- Enable in test/log-only mode for 24 h:
[OPTIONS] log_level_in: infoto capture what would be blocked - Review logs, fix mis-blocked stuff, then flip
enable: 1 - Risk: bad rule locks Kay out of PVE web UI + SSH. Mitigation: VLAN 40 direct L2 to 172.16.1.5 is hardwired allow + chassis console as ultimate fallback
- Dependencies: none
- Estimated time: 1.5 hours including 24-hour observation window
#35 WebAuthn / Passkeys for Authelia ๐ก¶
- Trigger to revisit: Kay wants Face ID login on iPhone instead of typing TOTP code
- Proposed solution: enable WebAuthn provider in Authelia config, enroll iPhone first
- Concrete steps (~20 min):
- SSH to LXC 254:
pct exec 254 -- vi /opt/authelia/config/configuration.yml - Add under top level (peer of
totp:): - Restart container:
pct exec 254 -- docker restart authelia - From iPhone Safari, go to https://auth.hm.iamkay.eu/ โ log in with password + TOTP โ Settings โ "Two-Factor Methods" โ "Register Security Key" โ iOS Face ID prompt โ confirms โ saved to iCloud Keychain
- Repeat enrollment on each device: Windows laptop (Windows Hello), Mac (Touch ID/Face ID), iPad
- Test: open Authelia portal in a different browser, password + WebAuthn flow should work
- Keep TOTP enrolled as fallback in case Apple keychain ever has issues
- Risk: low โ additive 2FA method, doesn't replace TOTP
- Dependencies: Authelia โฅ 4.36 (have 4.39.20 โ)
- Estimated time: 20 min
#36 M9 โ SMTP relay for system mail ๐ ¶
- Trigger to revisit: anytime in next 2 weeks (PBS backup failure mail doesn't currently deliver โ silent failure risk)
- Proposed solution: Resend.com (3000 emails/month free, simple API key, no credit card)
- Concrete steps (~30 min):
- Kay: create Resend.com account (sign up free), verify ownership of
iamkay.eudomain (add the DNS records they show โ same Cloudflare account) - Resend dashboard โ API Keys โ Create โ restricted to "Send emails" only
- On PVE host:
apt install libsasl2-modules - Edit
/etc/postfix/main.cf:relayhost = [smtp.resend.com]:587smtp_use_tls = yessmtp_sasl_auth_enable = yessmtp_sasl_security_options = noanonymoussmtp_sasl_password_maps = hash:/etc/postfix/sasl_passwdsmtp_tls_security_level = encryptsmtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
- Create
/etc/postfix/sasl_passwd:[smtp.resend.com]:587 resend:re_xxxxxxxxxxxxx postmap /etc/postfix/sasl_passwd && chmod 600 /etc/postfix/sasl_passwd*systemctl restart postfix- Test:
echo "PBS test" | mail -s "PBS test" [email protected]โ should arrive in Hotmail within 30 s - Trigger an actual PBS backup failure (e.g. temporarily wrong storage name) โ verify failure mail arrives
- Repeat config on PBS VM 200 separately (its own postfix)
- Risk: low โ only affects outbound mail
- Dependencies: Kay creates Resend account + domain DNS verify (~10 min on his side)
- Estimated time: 30 min me + 10 min Kay
- Side benefit: enables M5 (Vaultwarden require_device_email), Authelia password-reset flow, Nextcloud share-by-email when family lands
Medium-severity items from audit (not yet formally tasked)¶
M2 โ Replace RSA SSH key on PVE root with ed25519 ๐ก¶
- Trigger to revisit: opportunistic โ quick win
- Proposed solution: identify which RSA key is in
/etc/pve/priv/authorized_keys, regenerate as ed25519, swap - Concrete steps (~10 min):
awk '$1=="ssh-rsa"' /etc/pve/priv/authorized_keysto find it- Identify whose laptop has the matching private key (check comment field)
- On that machine:
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_pve -C "kay-pve-$(date +%Y)" ssh-copy-id -i ~/.ssh/id_ed25519_pve.pub [email protected]- Remove RSA line from
/etc/pve/priv/authorized_keys - Update
~/.ssh/configon laptop to point at new key forpvehost alias - Test SSH from a fresh terminal
- Risk: very low
- Estimated time: 10 min
M3 โ Internal hostname leak via public DNS ๐ต¶
- Status: ACCEPTED โ cosmetic only;
hm.iamkay.euapex DNS resolves to Cloudflare proxy IPs but no service responds (1014 error). Attacker learns we usehm.iamkay.eunaming pattern; no actual exposure. - Future fix (if ever bothered): migrate internal TLD from
hm.iamkay.euto.lanor.home.arpaor unregistered. High effort, low value. Defer indefinitely.
M5 โ Vaultwarden require_device_email ๐ก¶
- Trigger to revisit: AFTER M9 (SMTP relay) lands
- Proposed solution: edit Vaultwarden config.json, set
require_device_email: true, restart container - Dependencies: SMTP working (M9)
- Estimated time: 5 min (gated on M9)
M6 โ cloudflared TUNNEL_TOKEN as Docker secret ๐ก¶
- Trigger to revisit: opportunistic security hygiene; not load-bearing
- Proposed solution: refactor docker-compose to mount token as file vs env
- Concrete steps (~15 min):
- Write
/opt/cloudflared/cf-tunnel.token(already exists pattern); chmod 600 - Edit
/opt/cloudflared/docker-compose.yml:- Add top-level
secrets:block with file path - Container env:
TUNNEL_TOKEN_FILE: /run/secrets/cf-tunnel-token - Container
secrets: - cf-tunnel-token
- Add top-level
- cloudflared image supports
TUNNEL_TOKEN_FILEas of v2024+ - Restart container
- Verify token isn't in
docker inspectenv anymore - Risk: low, but restart drops tunnel briefly (~5 s)
- Estimated time: 15 min
M7 โ Cloudflare API token scope review ๐ต¶
- Trigger to revisit: 6-month rotation cadence
- Proposed solution: in Cloudflare dashboard โ My Profile โ API Tokens โ review token
cfut_RWYgPHFSqh3Zq8b29DXXJBAk4NKYCIqYB9vs4tFr6034e472โ confirm scope is exactly: Zone:DNS:Edit + Zone:Zone:Read oniamkay.euonly - Concrete steps (~5 min): Kay checks dashboard; rotate if scope creep found
- Estimated time: 5 min
M8 โ PBS chunk-handler tuning ๐ก¶
- Trigger to revisit: next time a restore drill hangs the UI (Session 8 finding)
- Proposed solution: bump PBS VM 200 memory 4 โ 6 GB; investigate
--chunk-cache-sizetuning - Concrete steps (~15 min):
qm set 200 -memory 6144- Restart VM
- Test by mounting a backup snapshot via PBS UI while running a fresh backup โ should not freeze
- Estimated time: 15 min + restore drill validation
M10 โ BIOS P67 update (G7 microcode stale) ๐ก¶
- Trigger to revisit: next chassis-open (HBA install ~July)
- Proposed solution: bundle with SPP 2017.10.1 flash (already staged on PVE ISO storage)
- Concrete steps: BIOS update is part of the SPP run; no separate work needed
- Dependencies: chassis-open, iLO Virtual Media
- Estimated time: bundled with HBA install (~2 hours total for the chassis session)
M11 โ GitLab root user deprecation ๐ก¶
- Trigger to revisit: after H10 (you enrolling 2FA on root and promoting yourself to admin)
- Proposed solution: either disable root or keep with strong password + locked
- Concrete steps (~10 min after H10):
- Option A: create a real
kayGitLab user with admin role, disableroot(locked, password unknown, kept for "what if I lose access" emergency only) - Option B: keep using root, rename to
kay, enroll 2FA - Estimated time: 10 min
M12 โ Empty stub dirs cleanup on tanks ๐ต¶
- Trigger to revisit: opportunistic
- Proposed solution: rm 5 known-empty dirs after per-target Kay confirmation per CLAUDE.md ยง0 rule #2
- Targets:
r-tank/mac-store/,r-tank/proxmox/vms/,s-tank/proxmox/ct-k8s-datastore/,/windows-store/,/R-tank/(capitalized) - Estimated time: 5 min
M13 โ SIEM / Grafana auth-event dashboard ๐ก¶
- Trigger to revisit: when Path B (learning) curriculum hits observability deep-dive, OR when you want to spend an evening on it
- Proposed solution: write a Grafana dashboard with Loki queries
- Concrete steps (~60 min):
- Identify auth log paths: Authelia container
/config/notification.txt, Nextcloudnextcloud.login container, GitLab production.log, sshd via journald - Promtail config: scrape those paths with labels {service, level, host}
- Grafana panels:
- Login attempts per service per hour (graph)
- Failed logins per source IP per hour (table)
- Blocked / banned IPs per hour (gauge)
- Alert: > 10 failed in 5 min โ Uptime Kuma webhook โ email/iMessage
- Dependencies: Loki + Promtail healthy (they are)
- Estimated time: 60 min
M14 โ Cloudflare WAF + geo policy on cloud.iamkay.eu ๐ก¶
- Trigger to revisit: when you want to harden the family-cloud public surface
- Proposed solution: enable Cloudflare WAF managed rulesets (free) + soft geo policy
- Concrete steps (~20 min):
- Cloudflare dashboard โ cloud.iamkay.eu โ Security โ WAF โ enable "Cloudflare Managed Ruleset" (free) + "OWASP Core Ruleset" (free) at "low" sensitivity
- Bot Fight Mode โ On
- Optional: Page Rules โ if request country โ NL/DE/US/GB โ challenge (Turnstile JS challenge โ not block, so family travelers aren't locked out)
- Test from incognito + VPN: still loads, but slower TLS handshake during challenge
- Risk: false positives on legitimate users โ start lenient
- Estimated time: 20 min initial + tuning over 2 weeks
Finding-4 โ pfSense admin user 2FA via FreeRADIUS+TOTP ๐ ¶
- Status: Deferred 2026-06-07. FreeRADIUS pfSense package DOWNLOADED but not configured.
- Trigger to revisit: any of (a) VLAN 40 MGMT grows beyond Kay's trusted laptop (e.g., G5 cluster nodes land there), (b) Path B curriculum hits "authentication", (c) any pfSense access-log anomaly
- Why deferred: current mitigations bound the risk well โ pfSense web UI is only reachable from VLAN 40 (3 trusted hosts: laptop, PVE, iLO) and password is strong bcrypt. ~1 h FreeRADIUS setup with real lockout risk doesn't justify the marginal gain tonight.
- Proposed solution (when triggered):
- Pre-flight: open a SECOND pfSense admin session in a different browser, keep alive throughout โ if RADIUS breaks login, the live local-DB session can revert.
- System โ Package Manager โ verify FreeRADIUS is installed (already done).
- Services โ FreeRADIUS โ Settings tab:
- Enable on
localhost:1812(UDP) for auth - Shared Secret: generate strong random (vault it)
- EAP types: TLS-only or PEAP+MSCHAPv2 (depends on chosen TOTP module)
- Enable on
- Services โ FreeRADIUS โ Users tab โ Add user
kay:- Username: kay
- Password: same as pfSense local password (RADIUS layered ON TOP of password)
- "Motp Initial Secret" field (mod_otp): generate a 16-char hex secret โ vault it
- Save
- System โ User Manager โ Authentication Servers โ +Add:
- Type: RADIUS
- Hostname: 127.0.0.1
- Shared Secret: from step 3
- Auth port: 1812
- Save โ Test (should succeed for
kay+ correct password + correct TOTP code)
- System โ User Manager โ Settings โ change "Authentication server" from "Local Database" to the new RADIUS server. SAVE โ at this point, your live login is still valid (existing session), but next login requires RADIUS.
- Enroll TOTP: install authenticator app for the mod_otp secret. Same authenticator that has Authelia + Nextcloud + GitLab + PVE root.
- Test in a third browser (not the kept-alive sessions): login with
kay+ password โ should now prompt for TOTP code on a second screen. - If broken: revert via the kept-alive session in step 1 โ System โ User Manager โ Settings โ switch back to Local Database.
- Document final config in homelab-tracker + vault the FreeRADIUS shared secret + mod_otp seed.
- Dependencies: FreeRADIUS package (โ installed), authenticator app (have it), strong shared secret
- Risk: medium-high โ pfSense lockout if RADIUS breaks. Mitigation: kept-alive session in step 1 + chassis console fallback as last resort
- Estimated time: 1 hour (45 min setup + 15 min testing buffer)
- Cross-references: [[security-audit-2026-06-07.md]] Finding 4; [[learning-tracker.md]] Path B authentication module
SEC1 โ Cybersecurity tooling stack (Suricata + IDS/IPS + scanning + SIEM) ๐ ¶
- Status: NEW deferred entry born from Kay's reminder 2026-06-07 โ "remember the suricata and all other cybersecurity stuffs we have to do later"
- Trigger to revisit: any of (a) non-Kay users start using homelab (family, clients), (b) Path B curriculum hits "intrusion detection" / "blue team", (c) any incident surfaces a detection gap
- Proposed solution: layered cybersecurity stack across 4 tiers. NOT all at once โ one tier per session.
Tier 1 โ Network (perimeter): - Suricata IDS/IPS on pfSense โ pfSense package. Signature-based intrusion detection on WAN + inter-VLAN. Free ET Open ruleset + Talos. Start in alerts-only mode on WAN, then SERVERS interface, then DMZ. Log shipping to Loki for retention beyond pfSense's small SG-1100 disk. - pfBlockerNG on pfSense โ block bad IP ranges (DROP/EDROP/Spamhaus), GeoIP block (known C2 countries), DNSBL (lower priority since Pi-hole already does this). Start with IPv4 lists + GeoIP. - WAN-side rate limiting + connection tracking โ anti-SYN-flood / slow-loris fingerprinting.
Tier 2 โ Host (per-LXC/VM): - Wazuh agent everywhere โ open-source HIDS. FIM, config drift, log anomalies, kernel events, rootkit detection, ATT&CK mapping. Wazuh Manager in a new LXC (~2 GB RAM). Agents on PVE host, PBS, VM 189 homenas, all LXCs. - OSQuery (alternative/complement) โ lighter, SQL-based endpoint queries. - Lynis โ quarterly Linux hardening audit on each host. Score remediation backlog.
Tier 3 โ Application (containers + web): - Trivy container image scanner โ integrate with GitLab Runner CI: scan-on-pull, block CVSSโฅ9. Updates daily. - Clair alternative (slower, more mature). - OWASP ZAP / Nikto โ quarterly web vuln scans against Nextcloud, GitLab, Vaultwarden, Pi-hole. Don't auto-fire (false positive risk).
Tier 4 โ SOC / SIEM: - Wazuh dashboard doubles as SOC view + Loki integration for log correlation. - TheHive + Cortex โ incident response platform; overkill for solo but excellent Path B learning material. - MISP โ threat intel aggregation; integrates with Suricata + Wazuh.
Implementation order (one tier per session): 1. Tier 1 Suricata on pfSense โ biggest visibility / least effort, ~1 hr 2. Tier 1 pfBlockerNG GeoIP + DROP, ~30 min 3. Tier 2 Wazuh manager LXC + first agent (PVE host), ~2 hrs 4. Tier 2 Wazuh agents rolled out to all LXCs/VMs, ~1 hr 5. Tier 3 Trivy in GitLab CI, ~30 min 6. Lynis quarterly schedule, ~15 min one-time 7. Tier 4 Wazuh dashboard tuning + alert routing via M9 SMTP, ~1 hr
Dependencies: - M9 SMTP (โ done) โ for Wazuh alert email routing - Storage headroom (Wazuh manager ~50-100 GB for indexer) - Loki for Suricata log shipping (โ already deployed in monitoring LXC)
Cross-references: - [[security-audit-2026-06-07.md]] โ current posture baseline - [[learning-tracker.md]] Path B โ blue team curriculum overlap - [[homelab-tracker.md]] Phase 6 โ G5 cluster could host Wazuh manager when migrated
#34F โ pve-firewall rule tightening (after broader deployment) ๐ต¶
- Status: NEW deferred item born from pve-firewall enable 2026-06-07. Current rules are permissive baseline; this captures further tightening.
- Trigger to revisit: any of (a) 2nd admin user added to MGMT VLAN 40, (b) a Path B learning session on Linux host firewalls, (c) a security incident or scan result that highlights internal lateral movement
- Proposed solution: tighten the two permissive groups in
/etc/pve/firewall/cluster.fwIPSETs: - Tighten
trusted_mgmt: from172.16.1.0/24โ just172.16.1.100/32(your laptop only). Reasoning: today VLAN 40 has only your laptop + PVE + iLO, but a future device on MGMT shouldn't get auto-trust. The pattern of "list specific admin IPs" beats "trust the whole subnet" as soon as the subnet grows. - Tighten
trusted_servers: split into role-based rules instead of one /24. E.g., only10.0.10.14/32(monitoring) can hit port 9100; only10.0.10.10/32(Traefik) can hit port 8006 if it actually does; SSH from VLAN 10 stays at /24 (too many service containers might legitimately ssh in admin scenarios). Reasoning: any compromised LXC in VLAN 10 currently can probe PVE on SSH/web โ narrowing to specific service IPs reduces the lateral surface. - Concrete steps (~20 min):
pct exec 256 -- ip addrto confirm monitoring IP is 10.0.10.14- Edit
/etc/pve/firewall/cluster.fw: - Edit
/etc/pve/nodes/arochukwu/host.fw:- Change
+dc/trusted_servers -p tcp -dport 9100โ+dc/prometheus_scrapers -p tcp -dport 9100 - Optionally restrict 8006 too if no other LXC needs PVE API access
- Change
pve-firewall compileto syntax checkpve-firewall reload- Verify Prometheus still scrapes (HTTP 200 on /metrics from inside LXC 256)
- Verify your laptop still has full PVE access from 172.16.1.100
- Risk: medium โ if I scope
trusted_mgmtto only laptop's static IP and your laptop's IP ever changes (DHCP renewal that shifts you off 172.16.1.100, or you replace the laptop), you're locked out of PVE web UI via MGMT path. Fallback: VLAN 10 SSH (still works since LXCs are in trusted_servers). - Mitigation: only tighten AFTER setting laptop's MGMT IP to static reservation in pfSense DHCP
- Estimated time: 20 min for the rule edit + 10 min for DHCP static reservation + verification window
- Dependencies: static IP reservation for laptop on MGMT
- Cross-reference: extends task #34 (now closed)
Completed audit follow-ups (kept here as audit trail)¶
| Item | What | Closed |
|---|---|---|
| M1 | Stale pve-enterprise.list.dpkg-old apt sources file removed |
2026-06-07 |
| #33 | GRUB cmdline nomodeset vga=794 staged for kernel 6.8.12 G7 boot |
2026-06-07 |
| M4 | HSTS + security-headers on every Traefik router (closed in H7 sweep) | 2026-06-07 |
| #34 | pve-firewall enabled with permissive baseline (cluster.fw + host.fw, fail2ban-compatible) | 2026-06-07 |
| C4 | Off-site backup APPROVED deferred to July (single-user scope, R2 plan ready) | 2026-06-07 |
How to keep this doc current¶
- Every audit item triggers either: (a) immediate close, (b) deferred entry with concrete plan
- Every "let me defer that" remark in conversation triggers a section here
- When an item gets revisited, move to "Completed" section with date
- Quarterly review: drop items that are still "Future fix if ever bothered" status
Cross-references:
- security-audit-2026-06-07.md ยง12 priority list (original)
- homelab-tracker.md Phase 3+ status
- MEMORY.md project memories for VPS research, off-site backup, Grafana dashboards