The releaseAllSessionWriteLocks() call during shutdown is best-effort.
If it fails (crash, timeout, etc.), stale lock files survive with the
same PID. Since isAlive(pid) returns true for the still-running process,
the lock appears valid for up to staleMs (30 min).
Add a per-instance nonce to lock file payloads. The nonce is rotated
via resetInstanceNonce() when releaseAllSessionWriteLocks() is called
during shutdown, so the next server iteration within the same process
gets a fresh nonce. When acquiring a lock, if the on-disk nonce differs
from the current instance nonce and the PID matches, the lock is treated
as stale and immediately reclaimed.
The nonce is mutable (not const) because ESM modules are cached for the
process lifetime and are NOT re-evaluated on in-process SIGUSR1 restart.
Backward compatible: lock files without a nonce (from older versions)
fall through to the existing pid + staleMs checks unchanged.
Tests:
- Reclaims lock with same PID but different nonce
- Preserves lock with matching nonce (current instance)
- Falls back to pid+staleMs for nonce-less locks (backward compat)
- Verifies nonce rotation on releaseAllSessionWriteLocks
When the gateway restarts via SIGUSR1, server.close() kills running
agent turns (chatRunState.clear()) but never releases their session
write locks. The on-disk .lock files persist with PID 1, and
isAlive(1) returns true because it is the same process. This blocks
session access for up to staleMs (30 min) until the lock expires.
- Export releaseAllSessionWriteLocks() from session-write-lock.ts
- Call it in server-close.ts after chatRunState.clear()
- Add test for releaseAllSessionWriteLocks()