Migrate OpenLDAP tenants to per-tenant LMDB environments #5
Labels
No labels
bug
enhancement
in-progress
needs-info
needs-triage
p0
ready-for-agent
ready-for-human
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
Cloonar/nixos#5
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem Statement
All nine
olcDatabaseentries onmail.cloonar.comcurrently setolcDbDirectory = "/var/lib/openldap/data". There is a single LMDB environment (onedata.mdb+ onelock.mdb) on disk shared across the primary cloonar database and every per-Tenant database. The setup works only becauseslapd-mdbuses fixed sub-database names internally (id2entry,dn2id, etc.) — every Tenant's entries coexist in the same store, and slapd's per-request suffix routing is what keeps a query againstdc=superbros,dc=tvfrom returning entries belonging todc=cloonar,dc=com.Concrete consequences of the shared-env shape:
borgbackupof the LMDB env is all-or-nothing.slapcat -b <suffix>does not return per-suffix data — it dumps the entire shared env regardless of which-bis passed. Tooling assuming standard slapd-mdb semantics misbehaves.dc=foo,dc=barcontain on disk" cannot be answered without parsing the full env.olcDatabase:dc=optiprot,dc=eu(8 entries) anddc=ghetto,dc=at(1 entry). Plus a typo entrydc=cloonar,dc=co(1 entry,.conot.com). These ride forward in every backup as cruft.Solution
Migrate each Tenant onto its own LMDB environment by setting
olcDbDirectory = "/var/lib/openldap/<tenant-slug>/", where<tenant-slug>is derived deterministically from theolcSuffix. This is a one-time offline migration onmail.cloonar.com: snapshot the existing env,slapcatit to a single LDIF, split the LDIF into per-Tenant LDIFs by DN suffix, switch the Nix config, thenslapaddeach filtered LDIF into its Tenant's new directory.This is the target shape committed to in ADR-0001.
The "copy
data.mdbto every per-Tenant directory then clean up later" alternative was considered and rejected. Reasoning: cleanup via the LDAP API is impossible — slapd's suffix routing sends a delete request forcn=foo,dc=cloonar,dc=comto whichever database claimsdc=cloonar,dc=com, never to a Tenant env where that DN exists only as a ghost. Offline cleanup requires the sameslapcat/ filter /slapaddcycle as the upfront-filter approach, while the meantime carries the full dataset duplicated nine times in every Tenant's env. Filter-then-slapaddis the same elapsed effort with cleaner steady state.User Stories
namingContextlayout (one env per suffix), so that the system is debuggable using stock OpenLDAP knowledge instead of an undocumented quirk.slapcat -b <suffix>to actually return only that suffix's entries, so that per-Tenant dumps for audit, debugging, or selective restore behave as documented./var/lib/openldap/data/recorded before migration, so that I have a documented rollback target.dc=optiprot,dc=eu,dc=ghetto,dc=at) and thedc=cloonar,dc=cotypo entry, so that the post-migration state is also a cleanup.mkTenanthelper (already introduced by the index-padding PRD) so that newly added Tenants acquire their own env automatically.Implementation Decisions
mail.cloonar.cominside a maintenance window. Online migration without downtime is explicitly not pursued.openldap.service./var/lib/openldap/data/to a timestamped tarball — both locally and pushed to the off-host borgbackup repo. The tarball is the documented rollback target.slapcat -F /etc/openldap/slapd.d -b dc=cloonar,dc=com -l /tmp/openldap-all.ldif. The-bargument is incidental — every configured database in the current setup shares the env, so any-bdumps the same thing./tmp/openldap-all.ldifinto one file per Tenant by DN suffix using a small script kept in the repo (likely Python — needs to handle LDIF line continuations correctly).dc=optiprot,dc=eu,dc=ghetto,dc=at,dc=cloonar,dc=co). Decision on whether to sweep them is per-execution.hosts/mail/modules/openldap.nix. The helper now emits per-TenantolcDbDirectory. The Nix activation creates/etc/openldap/slapd.d/afresh but does not populate the data directories./var/lib/openldap/<slug>/, set ownership toopenldap:openldap, and runslapadd -F /etc/openldap/slapd.d -b <suffix> -l <slug>.ldif.openldap.service.ldapsearch -Y EXTERNAL -H ldapi:/// -b <suffix> -s sub "(objectClass=*)" 1.1 | grep -c "^dn:". Counts must match the pre-migration awk-derived counts captured in the runbook.ldapsearch -Y EXTERNAL -H ldapi:/// -b <other-suffix>against a Tenant still routes correctly (returns entries under that suffix only). This already worked pre-migration; the post-check confirms parity./var/lib/openldap/(now containing multiple subdirectories) is fully included.mkTenanthelper from the index-padding PRD gains a derivedolcDbDirectoryfield. The slug is the first DC component of the suffix (e.g.,dc=superbros,dc=tv→superbros). Collisions between Tenants sharing a first DC component are statically rejected by the helper (assertion at module level). If a real collision arises later, the slug rule is reconsidered./var/lib/openldap/cloonar/), not the legacydata/path. The legacydata/is removed entirely after migration.scripts/so it can be reused if a similar migration is ever needed again (or in a staging dry-run).data.mdbis taken to a non-production environment (local VM or staging), the script is executed end-to-end there, counts are verified before the production maintenance window opens./var/lib/openldap/,git revertthe Nix change, redeploy. Time-bounded — if the post-migration verification at step 10 finds count mismatches, immediate rollback.Testing Decisions
ldapsearchcount under-b <suffix> -s submust equal the pre-migration count for that suffix.ldapsearch -Y EXTERNAL -H ldapi:///commands used during the planning session, which produced the per-suffix entry distribution that informed this PRD.Out of Scope
{N}index padding fix — separate, prerequisite PRD.back-mdb. No change planned./var/lib/openldap/cloonar/.Further Notes
olcDbDirectorychange.Depends on #4 landing first — the mkTenant helper introduced there is the integration point for the per-tenant olcDbDirectory change here.