Wrap waitForCompactionRetry() in try/catch to capture AbortError
that was escaping and bypassing the model fallback mechanism.
When a timeout fires, session.abort() causes both session.prompt()
and waitForCompactionRetry() to throw AbortError. Previously only
the prompt error was captured, allowing the compaction error to
escape to model-fallback.ts where it was immediately re-thrown
(line 199: isAbortError check), bypassing fallback model attempts.
Fixes#313
This commit fixes several issues with multi-account OAuth rotation that
were causing slow responses and inefficient account cycling.
## Changes
### 1. Fix usageStats race condition (auth-profiles.ts)
The `markAuthProfileUsed`, `markAuthProfileCooldown`, `markAuthProfileGood`,
and `clearAuthProfileCooldown` functions were using a stale in-memory store
passed as a parameter. Long-running sessions would overwrite usageStats
updates from concurrent sessions when saving.
**Fix:** Re-read the store from disk before each update to get fresh
usageStats from other sessions, then merge the update.
### 2. Capture AbortError from waitForCompactionRetry (pi-embedded-runner.ts)
When a request timed out, `session.abort()` was called which throws an
`AbortError`. The code structure was:
```javascript
try {
await session.prompt(params.prompt);
} catch (err) {
promptError = err; // Catches AbortError here
}
await waitForCompactionRetry(); // But THIS also throws AbortError!
```
The second `AbortError` from `waitForCompactionRetry()` escaped and
bypassed the rotation/fallback logic entirely.
**Fix:** Wrap `waitForCompactionRetry()` in its own try/catch to capture
the error as `promptError`, enabling proper timeout handling.
Root cause analysis and fix proposed by @erikpr1994 in #313.
Fixes#313
### 3. Fail fast on 429 rate limits (pi-ai patch)
The pi-ai library was retrying 429 errors up to 3 times with exponential
backoff before throwing. This meant a rate-limited account would waste
30+ seconds retrying before our rotation code could try the next account.
**Fix:** Patch google-gemini-cli.js to:
- Throw immediately on first 429 (no retries)
- Not catch and retry 429 errors in the network error handler
This allows the caller to rotate to the next account instantly on rate limit.
Note: We submitted this fix upstream (https://github.com/badlogic/pi-mono/pull/504)
but it was closed without merging. Keeping as a local patch for now.
## Testing
With 6 Antigravity accounts configured:
- Accounts rotate properly based on lastUsed (round-robin)
- 429s trigger immediate rotation to next account
- usageStats persist correctly across concurrent sessions
- Cooldown tracking works as expected
## Before/After
**Before:** Multiple 429 retries on same account, 30-90s delays
**After:** Instant rotation on 429, responses in seconds
Discord voice messages have empty `content` with the audio in attachments.
The nullish coalescing operator (`??`) doesn't fall through on empty strings,
so voice messages were being dropped as 'empty content'.
Changed to logical OR (`||`) so empty string falls through to media placeholder.
Adds a /stop command that:
- Can interrupt a running agent session mid-execution
- Works in both DMs and group chats (including forum topics)
- Uses grammy's bot.command() to run before the main message handler
- Returns status: stopped, stop requested, or nothing running
Also fixes session key lookup in pi-embedded-runner to use sessionKey
instead of sessionId, ensuring /stop finds the correct active run.
- Added `isError` property to `EmbeddedPiRunResult` and reply items to indicate error states.
- Updated error handling in `runReplyAgent` to provide more informative messages for specific socket connection errors.
- tabs.ts now uses getProfileContext like other routes
- browser-tool threads profile param through all actions
- add tests for profile query param on /tabs endpoints
- update docs with browser tool profile parameter
- Onboarding now writes auth profiles under ~/.clawdbot/agents/main/agent so the gateway sees credentials on first start.
- Hardened onboarding test to ignore legacy env vars.
Thanks @minghinmatthewlam!
* fix(slack): use named import for @slack/bolt App class
The default import `import bolt from '@slack/bolt'` followed by
`const { App } = bolt` doesn't work correctly in Bun due to ESM/CJS
interop issues. The default export comes through as a function rather
than the module object.
Switching to a named import `import { App } from '@slack/bolt'`
resolves the issue and allows the Slack provider to start successfully.
* fix(slack): align Bolt mock with named App export
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
Detect the Gemini API error 'function call turn comes immediately after
a user turn or after a function response turn' which indicates corrupted
session history.
When detected:
- Delete the corrupted transcript file
- Remove the session entry from the store
- Return a user-friendly message asking them to retry
This prevents the error loop where every subsequent message fails with
the same error until manual intervention.
Fixes#296
* fix(auth): prioritize round-robin over lastGood for multi-account rotation
When multiple OAuth accounts are configured, the round-robin rotation was
not working because lastGood was always prioritized, defeating the sort by
lastUsed.
Changes:
- Remove lastGood prioritization in resolveAuthProfileOrder
- Always apply orderProfilesByMode (sorts by lastUsed, oldest first)
- Only respect configuredOrder when explicitly set in config
- preferredProfile still takes priority for explicit user choice
Tested with 2 Google Antigravity accounts - verified alternating usage.
Follow-up to PR #269.
* style: fix formatting
- Add TelegramLocation, TelegramVenue, and TelegramMessageWithLocation types
- Add formatLocationMessage() to convert location/venue shares to text
- Add extractLocationData() for structured location access in ctxPayload
- Handle both raw location pins and venue shares (places with names)
- Include location in reply-to context for quoted messages
Location messages now appear as:
- [Location: lat, lon ±accuracy] for raw pins
- [Venue: Name - Address (lat, lon)] for places
ctxPayload includes LocationLat, LocationLon, LocationAccuracy,
VenueName, and VenueAddress fields for programmatic access.
Antigravity rate limits cause requests to hang indefinitely rather than
returning 429 errors. This change detects timeouts and treats them as
potential rate limits:
- Added timedOut flag to track timeout-triggered aborts
- Timeout now triggers profile cooldown + rotation
- Logs: "Profile X timed out (possible rate limit). Trying next account..."
This ensures automatic failover when Antigravity hangs due to rate limiting.
Adds usage tracking to auth profiles for automatic rotation:
- ProfileUsageStats type with lastUsed, cooldownUntil, errorCount
- markAuthProfileUsed(): tracks successful usage, resets errors
- markAuthProfileCooldown(): applies exponential backoff (1/5/25/60min)
- isProfileInCooldown(): checks if profile should be skipped
- orderProfilesByMode(): now sorts by lastUsed (oldest first)
On auth/rate-limit failures, profiles are marked for cooldown before
rotation. On success, usage is recorded for round-robin ordering.
This enables automatic load distribution across multiple accounts
(e.g., Antigravity 5-hour rate limit windows).
Changes writeOAuthCredentials and applyAuthProfileConfig calls to use
the email from OAuth response as part of the profile ID instead of
hardcoded ":default".
This enables multiple accounts per provider - each login creates a
separate profile (e.g., google-antigravity:user@gmail.com) instead
of overwriting the same :default profile.
Affected files:
- src/commands/onboard-auth.ts (generic writeOAuthCredentials)
- src/commands/configure.ts (Antigravity flow)
- src/wizard/onboarding.ts (Antigravity flow)
* fix: Gemini stops working after one message in a session
* fix: small issue in test file
* test: cover google role-merge behavior
---------
Co-authored-by: Peter Steinberger <steipete@gmail.com>
When replying to a message in a Slack thread, the response now stays
in the thread instead of going to the channel root.
Only threads replies when the incoming message was already in a thread
(has thread_ts). Top-level messages get top-level replies.
Fixes#250
what: default bash PATH to process.env.PATH
why: ensure Nix-provided tools on PATH inside sessions
tests: not run
Co-authored-by: Peter Steinberger <steipete@gmail.com>
* cron: skip delivery for HEARTBEAT_OK responses
When an isolated cron job has deliver:true, skip message delivery if the
response is just HEARTBEAT_OK (or contains HEARTBEAT_OK at edges with
short remaining content <= 30 chars). This allows cron jobs to silently
ack when nothing to report but still deliver actual content when there
is something meaningful to say.
Media is still delivered even if text is HEARTBEAT_OK, since the
presence of media indicates there's something to share.
* fix(heartbeat): make ack padding configurable
* chore(deps): update to latest
---------
Co-authored-by: Josh Lehman <josh@martian.engineering>
When an isolated cron job has deliver:true, skip message delivery if the
response is just HEARTBEAT_OK (or contains HEARTBEAT_OK at edges with
short remaining content <= 30 chars). This allows cron jobs to silently
ack when nothing to report but still deliver actual content when there
is something meaningful to say.
Media is still delivered even if text is HEARTBEAT_OK, since the
presence of media indicates there's something to share.
The SDK's tools parameter only accepts built-in tools (read, bash, edit, write).
Custom clawdbot tools (browser, canvas, nodes, cron, etc.) were being filtered
out, causing 'Tool not found' errors at runtime.
Split tools into built-in and custom, passing them via the correct parameters.