Refactor OpenLDAP tenant config to fix recurring {10} index bug #4

Closed
opened 2026-05-20 16:01:18 +02:00 by dominik.polakovics · 1 comment

Problem Statement

Adding the 10th LDAP Tenant to the mail host's OpenLDAP configuration is impossible without breakage. Every time the fleet needs another olcDatabase, the index value crosses the single-digit threshold and slapd refuses to load the generated cn=config tree.

The cause: the NixOS services.openldap module emits cn=config LDIF entries by walking settings.children with lib.mapAttrsToList, and Nix attribute sets are sorted lexicographically. The keys are strings like "olcDatabase={1}mdb""olcDatabase={9}mdb". Adding "olcDatabase={10}mdb" produces this evaluation order:

olcDatabase={10}mdb
olcDatabase={1}mdb
olcDatabase={2}monitor
olcDatabase={3}mdb
…

{10} sorts before {2} as a string. slapadd then sees database index 10 arrive before indices 2–9 exist, and either errors out, silently renumbers, or scrambles which suffix maps to which configured database. The recurring symptom is "I cannot add another olcDatabase if it exceeds 9 — when I want to add 10 I get all sorts of problems."

Independently, the existing per-Tenant blocks have already silently drifted: {4}superbros and {6}szaku lack the cn=authelia,... write ACL that every other Tenant has, and the cn={3}cloonar,cn=schema entry has a DN/cn-attribute disagreement ({3} vs {1}). Both are symptoms of hand-maintained per-Tenant boilerplate.

Solution

Refactor the per-Tenant olcDatabase blocks in hosts/mail/modules/openldap.nix into a single list of { suffix, authelia ? true } records consumed by an inline helper that assigns sequential 2-digit zero-padded indices ({01}{99}). Padded indices make lexicographic sort match numerical sort permanently, so the next Tenant slots in at {10} and the one after at {11} without recurrence of the bug.

The primary cloonar database ({01}mdb after padding) and the {02}monitor database stay hand-written — their shape (rootDN, overlays, loginShell ACL, monitor objectClass) differs structurally from regular Tenants and forcing them through the helper would introduce a dozen opt-in flags with one caller each.

While in the file, fix the cn={3}cloonar,cn=schema DN/attribute inconsistency by setting the inner cn attribute to "{3}cloonar". Do not pad schema entries — empirically slapd tolerates duplicate {N} for schema siblings, and the current shape works fine.

This PRD covers the structural fix only. Migrating each Tenant to its own LMDB environment (so olcDbDirectory is per-Tenant instead of shared) is the long-term target captured in ADR-0001 and tracked in a separate PRD.

User Stories

  1. As a fleet operator, I want to add the 10th LDAP Tenant by appending one line to a list, so that the configuration doesn't fight me when the directory grows past 9 entries.
  2. As a fleet operator, I want to add the 50th or 99th Tenant with the same one-line operation, so that I don't trip on this lexicographic-sort issue ever again within the foreseeable scale of the deployment.
  3. As a fleet operator, I want every Tenant's ACL set generated from one source of truth, so that ACL drift doesn't silently accumulate between similar Tenants.
  4. As a fleet operator, I want the existing Tenants that opt out of Authelia write access ({4}superbros, {6}szaku) to keep that opt-out after the refactor, so that I do not grant new bind access to a DN that was deliberately excluded.
  5. As a future maintainer, I want a typo-resistant API for declaring a Tenant so that adding a database is harder to get wrong than copy-pasting fifty lines.
  6. As a future maintainer, I want the cn={3}cloonar DN/attribute inconsistency fixed, so that the next person to touch the schema block does not assume the divergence is intentional.
  7. As a release operator, I want to verify the generated config.ldif before deploy, so that I have empirical confidence the change is purely the intended renames.
  8. As a release operator, I want the change to be safe under the existing mutableConfig = false activation model, so that no on-host data migration is needed for this PR.

Implementation Decisions

  • Change is scoped to hosts/mail/modules/openldap.nix. No cross-file refs to olcDatabase={N} indices exist; the rename is local.
  • An inline helper in that file builds Tenant olcDatabase entries from a list of { suffix, authelia ? true } records, assigning sequential 2-digit-padded indices via lib.imap1 + lib.fixedWidthNumber 2.
  • The list of Tenants is ordered to reproduce today's index assignment ({03}macher, {04}superbros, {05}docfast, {06}szaku, {07}myhidden, {08}korean-skin, {09}scana11y). Adding a new Tenant is a single append; it becomes {10} automatically.
  • Existing {1}cloonar and {2}monitor are renamed to {01} and {02} and stay hand-written. Their structural difference (rootDN, sops-backed rootpw, memberof + ppolicy overlays, loginShell ACL, olcMonitorConfig objectClass, netdata-specific monitor ACL) does not fit the Tenant shape.
  • The standard three-rule Tenant ACL pattern (userPassword / pgpPublicKey / catch-all) is factored into a function standardTenantAccess { authelia }; the authelia flag controls whether the cn=authelia,... write clause appears in the userPassword rule.
  • {04}superbros and {06}szaku are wired with authelia = false to preserve current behavior bit-for-bit. Decision to re-enable authelia on them is explicitly out of scope.
  • The cn={3}cloonar,cn=schema inner cn attribute is changed from "{1}cloonar" to "{3}cloonar". No other schema changes.
  • Padding width is 2 digits, covering up to 99 Tenants. A future widening to 3 digits is mechanically the same operation (rename keys + inner olcDatabase values).
  • The memberof and ppolicy overlay DNs (olcOverlay=...,olcDatabase={1}mdb) are updated to reference {01}mdb. The helper does not generate overlays.
  • olcDbDirectory stays at /var/lib/openldap/data for every entry (today's shared LMDB env). Migrating to per-Tenant directories is the deferred target tracked in ADR-0001 and the per-Tenant-dirs PRD.
  • Migration safety: services.openldap.mutableConfig = false (NixOS default) wipes /etc/openldap/slapd.d/ on every activation and reruns slapadd against the generated config.ldif. The LMDB data at /var/lib/openldap/data/ is not touched — entries are keyed by olcSuffix, not by {N}. Renaming indices is therefore safe.
  • A new ADR (docs/adr/0001-shared-lmdb-env-for-tenants.md) is created in the same PR. It documents the current shared-LMDB shape, the target shape Y (per-Tenant LMDB envs), and the explicit deferral. It links to the per-Tenant-dirs PRD.

Testing Decisions

  • A good test for a Nix module refactor verifies the generated artifact — the LDIF that slapadd consumes — rather than the Nix expression itself. The Nix evaluation succeeding is necessary but not sufficient; the LDIF must be the right LDIF.
  • Verification gate before deploy:
    1. ./scripts/test-configuration mail — confirms the host evaluates and the system derivation builds cleanly.
    2. Build the system derivation, locate the generated config.ldif, diff it against the LDIF produced by the pre-refactor config. Expected diff: padding renames ({1}{01}, etc.) across keys and inner olcDatabase attribute values, the cn={3}cloonar inner cn attribute change, and nothing else. Any other diff line is either ACL drift introduced by the helper or a helper logic bug.
  • The diff is pasted into the PR description as evidence.
  • No automated test suite for this file exists in the repo and none is added — it is a leaf host module. The repo's convention for one-shot configuration changes is test-configuration + manual review; this PRD strengthens the manual-review step to "review the LDIF diff" rather than "review the Nix diff".
  • Prior art: none in the repo for OpenLDAP specifically.

Out of Scope

  • Per-Tenant olcDbDirectory migration (target shape Y) — separate PRD, captured in ADR-0001.
  • Cleanup of orphaned suffixes (dc=optiprot,dc=eu, dc=ghetto,dc=at) and the dc=cloonar,dc=co typo entry that exist in today's shared LMDB but are unreachable via LDAP — folded into the per-Tenant-dirs PRD or handled as a follow-up.
  • Schema padding for all cn={N}foo,cn=schema entries — empirically unnecessary since slapd tolerates duplicate {N} for schema siblings. Only the {3}cloonar cn/RDN disagreement is fixed.
  • Adding a new Tenant — none queued at the time of this PRD; the structural fix makes the next addition a one-line commit.
  • Re-enabling Authelia on {04}superbros and {06}szaku. Today's behavior is preserved verbatim. If those omissions turn out to be copy-paste regressions rather than intentional opt-outs, a separate single-line change flips them.
  • Touching consumers (postfix, dovecot, authelia, owncloud) — the rename is internal to slapd's cn=config. Clients connect by olcSuffix, not by {N}.

Further Notes

  • ADR-0001 is created in the same PR. CONTEXT.md was already updated with the Tenant term and the known anomalies (shared LMDB env, orphan suffixes, typo entry) during planning.
  • This PRD is the prerequisite for the per-Tenant-dirs migration PRD. That PRD's helper extension assumes the mkTenant-style helper introduced here.
  • The recurring frustration the operator described ("I cannot add another olcDatabase if it exceeds 9 — when I want to add 10 I get all sorts of problems") is fully resolved by this PRD at the structural level: post-merge, appending a single entry to the Tenant list adds {10}, {11}, etc., without slapd protest.
## Problem Statement Adding the 10th LDAP **Tenant** to the `mail` host's OpenLDAP configuration is impossible without breakage. Every time the fleet needs another `olcDatabase`, the index value crosses the single-digit threshold and slapd refuses to load the generated `cn=config` tree. The cause: the NixOS `services.openldap` module emits `cn=config` LDIF entries by walking `settings.children` with `lib.mapAttrsToList`, and Nix attribute sets are sorted **lexicographically**. The keys are strings like `"olcDatabase={1}mdb"` … `"olcDatabase={9}mdb"`. Adding `"olcDatabase={10}mdb"` produces this evaluation order: ``` olcDatabase={10}mdb olcDatabase={1}mdb olcDatabase={2}monitor olcDatabase={3}mdb … ``` `{10}` sorts before `{2}` as a string. `slapadd` then sees database index 10 arrive before indices 2–9 exist, and either errors out, silently renumbers, or scrambles which suffix maps to which configured database. The recurring symptom is "I cannot add another `olcDatabase` if it exceeds 9 — when I want to add 10 I get all sorts of problems." Independently, the existing per-Tenant blocks have already silently drifted: `{4}superbros` and `{6}szaku` lack the `cn=authelia,...` write ACL that every other Tenant has, and the `cn={3}cloonar,cn=schema` entry has a DN/`cn`-attribute disagreement (`{3}` vs `{1}`). Both are symptoms of hand-maintained per-Tenant boilerplate. ## Solution Refactor the per-Tenant `olcDatabase` blocks in `hosts/mail/modules/openldap.nix` into a single list of `{ suffix, authelia ? true }` records consumed by an inline helper that assigns sequential **2-digit zero-padded** indices (`{01}` … `{99}`). Padded indices make lexicographic sort match numerical sort permanently, so the next Tenant slots in at `{10}` and the one after at `{11}` without recurrence of the bug. The primary cloonar database (`{01}mdb` after padding) and the `{02}monitor` database stay hand-written — their shape (rootDN, overlays, loginShell ACL, monitor objectClass) differs structurally from regular Tenants and forcing them through the helper would introduce a dozen opt-in flags with one caller each. While in the file, fix the `cn={3}cloonar,cn=schema` DN/attribute inconsistency by setting the inner `cn` attribute to `"{3}cloonar"`. Do not pad schema entries — empirically slapd tolerates duplicate `{N}` for schema siblings, and the current shape works fine. This PRD covers the **structural fix only**. Migrating each Tenant to its own LMDB environment (so `olcDbDirectory` is per-Tenant instead of shared) is the long-term target captured in ADR-0001 and tracked in a separate PRD. ## User Stories 1. As a fleet operator, I want to add the 10th LDAP **Tenant** by appending one line to a list, so that the configuration doesn't fight me when the directory grows past 9 entries. 2. As a fleet operator, I want to add the 50th or 99th **Tenant** with the same one-line operation, so that I don't trip on this lexicographic-sort issue ever again within the foreseeable scale of the deployment. 3. As a fleet operator, I want every **Tenant**'s ACL set generated from one source of truth, so that ACL drift doesn't silently accumulate between similar **Tenants**. 4. As a fleet operator, I want the existing **Tenants** that opt out of Authelia write access (`{4}superbros`, `{6}szaku`) to keep that opt-out after the refactor, so that I do not grant new bind access to a DN that was deliberately excluded. 5. As a future maintainer, I want a typo-resistant API for declaring a **Tenant** so that adding a database is harder to get wrong than copy-pasting fifty lines. 6. As a future maintainer, I want the `cn={3}cloonar` DN/attribute inconsistency fixed, so that the next person to touch the schema block does not assume the divergence is intentional. 7. As a release operator, I want to verify the generated `config.ldif` before deploy, so that I have empirical confidence the change is purely the intended renames. 8. As a release operator, I want the change to be safe under the existing `mutableConfig = false` activation model, so that no on-host data migration is needed for this PR. ## Implementation Decisions - Change is scoped to `hosts/mail/modules/openldap.nix`. No cross-file refs to `olcDatabase={N}` indices exist; the rename is local. - An inline helper in that file builds **Tenant** `olcDatabase` entries from a list of `{ suffix, authelia ? true }` records, assigning sequential 2-digit-padded indices via `lib.imap1` + `lib.fixedWidthNumber 2`. - The list of **Tenants** is ordered to reproduce today's index assignment (`{03}macher`, `{04}superbros`, `{05}docfast`, `{06}szaku`, `{07}myhidden`, `{08}korean-skin`, `{09}scana11y`). Adding a new **Tenant** is a single append; it becomes `{10}` automatically. - Existing `{1}cloonar` and `{2}monitor` are renamed to `{01}` and `{02}` and stay hand-written. Their structural difference (rootDN, sops-backed rootpw, `memberof` + `ppolicy` overlays, `loginShell` ACL, `olcMonitorConfig` objectClass, netdata-specific monitor ACL) does not fit the **Tenant** shape. - The standard three-rule **Tenant** ACL pattern (userPassword / pgpPublicKey / catch-all) is factored into a function `standardTenantAccess { authelia }`; the `authelia` flag controls whether the `cn=authelia,...` write clause appears in the userPassword rule. - `{04}superbros` and `{06}szaku` are wired with `authelia = false` to preserve current behavior bit-for-bit. Decision to re-enable authelia on them is explicitly out of scope. - The `cn={3}cloonar,cn=schema` inner `cn` attribute is changed from `"{1}cloonar"` to `"{3}cloonar"`. No other schema changes. - Padding width is 2 digits, covering up to 99 **Tenants**. A future widening to 3 digits is mechanically the same operation (rename keys + inner `olcDatabase` values). - The `memberof` and `ppolicy` overlay DNs (`olcOverlay=...,olcDatabase={1}mdb`) are updated to reference `{01}mdb`. The helper does not generate overlays. - `olcDbDirectory` stays at `/var/lib/openldap/data` for every entry (today's shared LMDB env). Migrating to per-Tenant directories is the deferred target tracked in ADR-0001 and the per-Tenant-dirs PRD. - Migration safety: `services.openldap.mutableConfig = false` (NixOS default) wipes `/etc/openldap/slapd.d/` on every activation and reruns `slapadd` against the generated `config.ldif`. The LMDB data at `/var/lib/openldap/data/` is not touched — entries are keyed by `olcSuffix`, not by `{N}`. Renaming indices is therefore safe. - A new ADR (`docs/adr/0001-shared-lmdb-env-for-tenants.md`) is created in the same PR. It documents the current shared-LMDB shape, the target shape Y (per-**Tenant** LMDB envs), and the explicit deferral. It links to the per-Tenant-dirs PRD. ## Testing Decisions - A good test for a Nix module refactor verifies the **generated artifact** — the LDIF that `slapadd` consumes — rather than the Nix expression itself. The Nix evaluation succeeding is necessary but not sufficient; the LDIF must be the right LDIF. - Verification gate before deploy: 1. `./scripts/test-configuration mail` — confirms the host evaluates and the system derivation builds cleanly. 2. Build the system derivation, locate the generated `config.ldif`, diff it against the LDIF produced by the pre-refactor config. Expected diff: padding renames (`{1}` → `{01}`, etc.) across keys and inner `olcDatabase` attribute values, the `cn={3}cloonar` inner `cn` attribute change, and nothing else. Any other diff line is either ACL drift introduced by the helper or a helper logic bug. - The diff is pasted into the PR description as evidence. - No automated test suite for this file exists in the repo and none is added — it is a leaf host module. The repo's convention for one-shot configuration changes is `test-configuration` + manual review; this PRD strengthens the manual-review step to "review the LDIF diff" rather than "review the Nix diff". - Prior art: none in the repo for OpenLDAP specifically. ## Out of Scope - Per-**Tenant** `olcDbDirectory` migration (target shape Y) — separate PRD, captured in ADR-0001. - Cleanup of orphaned suffixes (`dc=optiprot,dc=eu`, `dc=ghetto,dc=at`) and the `dc=cloonar,dc=co` typo entry that exist in today's shared LMDB but are unreachable via LDAP — folded into the per-Tenant-dirs PRD or handled as a follow-up. - Schema padding for all `cn={N}foo,cn=schema` entries — empirically unnecessary since slapd tolerates duplicate `{N}` for schema siblings. Only the `{3}cloonar` cn/RDN disagreement is fixed. - Adding a new **Tenant** — none queued at the time of this PRD; the structural fix makes the next addition a one-line commit. - Re-enabling Authelia on `{04}superbros` and `{06}szaku`. Today's behavior is preserved verbatim. If those omissions turn out to be copy-paste regressions rather than intentional opt-outs, a separate single-line change flips them. - Touching consumers (postfix, dovecot, authelia, owncloud) — the rename is internal to slapd's `cn=config`. Clients connect by `olcSuffix`, not by `{N}`. ## Further Notes - ADR-0001 is created in the same PR. CONTEXT.md was already updated with the **Tenant** term and the known anomalies (shared LMDB env, orphan suffixes, typo entry) during planning. - This PRD is the prerequisite for the per-Tenant-dirs migration PRD. That PRD's helper extension assumes the `mkTenant`-style helper introduced here. - The recurring frustration the operator described ("I cannot add another `olcDatabase` if it exceeds 9 — when I want to add 10 I get all sorts of problems") is fully resolved by this PRD at the structural level: post-merge, appending a single entry to the **Tenant** list adds `{10}`, `{11}`, etc., without slapd protest.
Author
Owner

Long-term target is per-tenant LMDB envs, tracked in #5. ADR-0001 (to be added in this PR's implementation) commits to that target and links both issues.

Long-term target is per-tenant LMDB envs, tracked in #5. ADR-0001 (to be added in this PR's implementation) commits to that target and links both issues.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
Cloonar/nixos#4
No description provided.