Refactor OpenLDAP tenant config to fix recurring {10} index bug #4
Labels
No labels
bug
enhancement
in-progress
needs-info
needs-triage
p0
ready-for-agent
ready-for-human
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Cloonar/nixos#4
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem Statement
Adding the 10th LDAP Tenant to the
mailhost's OpenLDAP configuration is impossible without breakage. Every time the fleet needs anotherolcDatabase, the index value crosses the single-digit threshold and slapd refuses to load the generatedcn=configtree.The cause: the NixOS
services.openldapmodule emitscn=configLDIF entries by walkingsettings.childrenwithlib.mapAttrsToList, and Nix attribute sets are sorted lexicographically. The keys are strings like"olcDatabase={1}mdb"…"olcDatabase={9}mdb". Adding"olcDatabase={10}mdb"produces this evaluation order:{10}sorts before{2}as a string.slapaddthen sees database index 10 arrive before indices 2–9 exist, and either errors out, silently renumbers, or scrambles which suffix maps to which configured database. The recurring symptom is "I cannot add anotherolcDatabaseif it exceeds 9 — when I want to add 10 I get all sorts of problems."Independently, the existing per-Tenant blocks have already silently drifted:
{4}superbrosand{6}szakulack thecn=authelia,...write ACL that every other Tenant has, and thecn={3}cloonar,cn=schemaentry has a DN/cn-attribute disagreement ({3}vs{1}). Both are symptoms of hand-maintained per-Tenant boilerplate.Solution
Refactor the per-Tenant
olcDatabaseblocks inhosts/mail/modules/openldap.nixinto a single list of{ suffix, authelia ? true }records consumed by an inline helper that assigns sequential 2-digit zero-padded indices ({01}…{99}). Padded indices make lexicographic sort match numerical sort permanently, so the next Tenant slots in at{10}and the one after at{11}without recurrence of the bug.The primary cloonar database (
{01}mdbafter padding) and the{02}monitordatabase stay hand-written — their shape (rootDN, overlays, loginShell ACL, monitor objectClass) differs structurally from regular Tenants and forcing them through the helper would introduce a dozen opt-in flags with one caller each.While in the file, fix the
cn={3}cloonar,cn=schemaDN/attribute inconsistency by setting the innercnattribute to"{3}cloonar". Do not pad schema entries — empirically slapd tolerates duplicate{N}for schema siblings, and the current shape works fine.This PRD covers the structural fix only. Migrating each Tenant to its own LMDB environment (so
olcDbDirectoryis per-Tenant instead of shared) is the long-term target captured in ADR-0001 and tracked in a separate PRD.User Stories
{4}superbros,{6}szaku) to keep that opt-out after the refactor, so that I do not grant new bind access to a DN that was deliberately excluded.cn={3}cloonarDN/attribute inconsistency fixed, so that the next person to touch the schema block does not assume the divergence is intentional.config.ldifbefore deploy, so that I have empirical confidence the change is purely the intended renames.mutableConfig = falseactivation model, so that no on-host data migration is needed for this PR.Implementation Decisions
hosts/mail/modules/openldap.nix. No cross-file refs toolcDatabase={N}indices exist; the rename is local.olcDatabaseentries from a list of{ suffix, authelia ? true }records, assigning sequential 2-digit-padded indices vialib.imap1+lib.fixedWidthNumber 2.{03}macher,{04}superbros,{05}docfast,{06}szaku,{07}myhidden,{08}korean-skin,{09}scana11y). Adding a new Tenant is a single append; it becomes{10}automatically.{1}cloonarand{2}monitorare renamed to{01}and{02}and stay hand-written. Their structural difference (rootDN, sops-backed rootpw,memberof+ppolicyoverlays,loginShellACL,olcMonitorConfigobjectClass, netdata-specific monitor ACL) does not fit the Tenant shape.standardTenantAccess { authelia }; theautheliaflag controls whether thecn=authelia,...write clause appears in the userPassword rule.{04}superbrosand{06}szakuare wired withauthelia = falseto preserve current behavior bit-for-bit. Decision to re-enable authelia on them is explicitly out of scope.cn={3}cloonar,cn=schemainnercnattribute is changed from"{1}cloonar"to"{3}cloonar". No other schema changes.olcDatabasevalues).memberofandppolicyoverlay DNs (olcOverlay=...,olcDatabase={1}mdb) are updated to reference{01}mdb. The helper does not generate overlays.olcDbDirectorystays at/var/lib/openldap/datafor every entry (today's shared LMDB env). Migrating to per-Tenant directories is the deferred target tracked in ADR-0001 and the per-Tenant-dirs PRD.services.openldap.mutableConfig = false(NixOS default) wipes/etc/openldap/slapd.d/on every activation and rerunsslapaddagainst the generatedconfig.ldif. The LMDB data at/var/lib/openldap/data/is not touched — entries are keyed byolcSuffix, not by{N}. Renaming indices is therefore safe.docs/adr/0001-shared-lmdb-env-for-tenants.md) is created in the same PR. It documents the current shared-LMDB shape, the target shape Y (per-Tenant LMDB envs), and the explicit deferral. It links to the per-Tenant-dirs PRD.Testing Decisions
slapaddconsumes — rather than the Nix expression itself. The Nix evaluation succeeding is necessary but not sufficient; the LDIF must be the right LDIF../scripts/test-configuration mail— confirms the host evaluates and the system derivation builds cleanly.config.ldif, diff it against the LDIF produced by the pre-refactor config. Expected diff: padding renames ({1}→{01}, etc.) across keys and innerolcDatabaseattribute values, thecn={3}cloonarinnercnattribute change, and nothing else. Any other diff line is either ACL drift introduced by the helper or a helper logic bug.test-configuration+ manual review; this PRD strengthens the manual-review step to "review the LDIF diff" rather than "review the Nix diff".Out of Scope
olcDbDirectorymigration (target shape Y) — separate PRD, captured in ADR-0001.dc=optiprot,dc=eu,dc=ghetto,dc=at) and thedc=cloonar,dc=cotypo entry that exist in today's shared LMDB but are unreachable via LDAP — folded into the per-Tenant-dirs PRD or handled as a follow-up.cn={N}foo,cn=schemaentries — empirically unnecessary since slapd tolerates duplicate{N}for schema siblings. Only the{3}cloonarcn/RDN disagreement is fixed.{04}superbrosand{06}szaku. Today's behavior is preserved verbatim. If those omissions turn out to be copy-paste regressions rather than intentional opt-outs, a separate single-line change flips them.cn=config. Clients connect byolcSuffix, not by{N}.Further Notes
mkTenant-style helper introduced here.olcDatabaseif it exceeds 9 — when I want to add 10 I get all sorts of problems") is fully resolved by this PRD at the structural level: post-merge, appending a single entry to the Tenant list adds{10},{11}, etc., without slapd protest.Long-term target is per-tenant LMDB envs, tracked in #5. ADR-0001 (to be added in this PR's implementation) commits to that target and links both issues.