- Add active agent run tracking functions (getActiveAgentRunCount, getActiveAgentRunIds)
- Add onAgentRunComplete callback to notify when runs complete
- Modify config-reload to defer restarts when active runs exist
- Queue restart and apply when all runs complete
- Add guard in AppState.swift to prevent config writes in remote mode
- Set gateway.reload.mode to "off" as immediate fix
This prevents the bot from interrupting in-flight messages during
config changes, fixing the "bot typing but never responds" issue.
Root cause: macOS app was writing to gateway config even in remote mode,
triggering file watcher → reload handler → SIGUSR1 → shutdown during messages.
**Changes:**
- Added `.claude/` directory to .gitignore (Claude Code local settings)
- Removed `.claude/settings.json` from git tracking (local/environment-specific)
**Rationale:**
- `.claude/settings.json` contains user-specific Claude Code permissions and settings
- Should never be committed to git (similar to .vscode/, .idea/)
- Each developer should manage their own local settings
- Prevents merge conflicts from local configuration differences
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
**Added comprehensive section showing:**
- User perspective: What you see when bot doesn't respond (typing indicator appears, then nothing)
- Technical reality: Exact timeline from logs showing message processing interrupted mid-tool-execution
- Visual timeline with precise timestamps (21:20:35 → 21:21:02)
- Actual log evidence from moltbot and PM2 logs
- Root problem chain: config rewrite → file watcher → reload handler → SIGUSR1 → shutdown
- The blocker: config file being automatically restored (unknown mechanism creating reload cycle)
- Verification table ruling out other causes
**Timeline captured:**
- T+0: Message received (21:20:35)
- T+1: Agent processing starts (21:20:36)
- T+14: First tool completes (21:20:49)
- T+27: Second tool starts (21:21:02)
- T+27.007: SIGUSR1 signal received - gateway self-terminates
- T+31: Gateway restarts with new PID (21:21:06)
**Key finding:** This is NOT a crash, it's a controlled graceful shutdown triggered by SIGUSR1 during message processing due to config file rewrites.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Documented the plugin command registration overflow issue that caused
the Telegram bot to crash at startup. The fix disables plugin.entries.telegram
in the moltbot.json config to prevent extension commands from being
registered on Telegram (which has a 100-command API limit).
Issue occurred when too many installed extensions (Discord, Matrix, Mattermost, etc.)
tried to register their commands for Telegram, exceeding the limit.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
The bot was crashing during Telegram initialization because installed
extensions (Discord, Matrix, Mattermost, etc.) were registering their
plugin commands for Telegram, exceeding the 100-command API limit.
Disabled plugin.entries.telegram.enabled to prevent non-Telegram
extensions from registering commands on Telegram.
This fix allows the Telegram bot to initialize cleanly without
crashing when messages are received.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Added new section "Gateway Stability Infrastructure" covering:
- Multi-layer stability design (system, PM2, startup hooks, health monitoring)
- All monitoring commands with examples
- Recovery scenarios and automated responses
- What problems this prevents
This comprehensive infrastructure ensures:
- No more crashes from Telegram message processing
- Automatic detection and recovery from hangs
- Prevention of inotify exhaustion hangs
- Memory limit protection
- Clean lock file management
- Full visibility into gateway health
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Created scripts/gateway-start.sh: Startup wrapper that cleans stale lock files
before starting the gateway (prevents "already running" errors)
- Created scripts/pm2-health-monitor.js: Standalone health check process managed by PM2
* Monitors port 18789 connectivity every 5 minutes
* Detects unresponsive gateway (process running but port hung)
* Force-restarts via killall + PM2 auto-recovery
* Monitors inotify watcher usage (warns at 80% of limit)
* Logs to /tmp/moltbot/pm2-health-monitor.log
- Updated ecosystem.config.cjs to:
* Use gateway-start.sh wrapper for lock cleanup
* Add moltbot-health-monitor as separate PM2 app
* Health monitor runs alongside gateway (same PM2 daemon, isolated from other daemons)
Key Design Principles:
- PM2 handles process lifecycle (restart, memory limits, crash recovery)
- Health monitor adds responsiveness detection (what PM2 can't do alone)
- No systemd involvement (prevents port conflicts with other PM2 instances)
- Each PM2 daemon isolated: moltbot-gateway, si_project/dashboard, ai_product_visualizer
This ensures gateway remains stable even if it becomes unresponsive to Telegram messages.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Removed conflicting systemd service files (moltbot-gateway.service, moltbot-health-check.*)
- Removed redundant health-check script (PM2 handles restarts natively)
- Updated README_Tech.md to document PM2 as actual process manager
- Clarified that inotify fix (524288 limit) is permanent solution
- Documented PM2 commands for troubleshooting and monitoring
- Added safety note: Never use systemd for moltbot-gateway (causes port conflicts)
- Fixed architecture documentation to reflect PM2 daemon isolation model
Gateway now running cleanly via PM2 (PID 661291) without systemd interference.
Inotify limit verified at 524288 (prevents file watcher exhaustion).
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Changed skills/global-shared/ and skills/global-skills/ to remove trailing
slashes so gitignore properly ignores symlinks (not just directories).
Trailing slashes only match directories, not symlinks.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Major Changes:
- Implement task-type router (src/agents/task-type-router.ts) for intelligent model routing
* Detects task type from user message (file-analysis, creative, debugging, cli, general)
* Routes to optimal models: Gemini Flash (file analysis), Llama 3.3 70B (creative),
Claude Sonnet 4.5 (debugging), Mistral Devstral 2 (CLI/general)
* Integrated into model selection pipeline for seamless routing
- Integrate task-type routing into model resolution (src/agents/model-selection.ts)
* Pass userMessage to resolveDefaultModelForAgent for context-aware routing
* Maintain fallback chain for model availability
- Update attempt runner (src/agents/pi-embedded-runner/run/attempt.ts)
* Pass prompt context to enable task-type based model selection
- Enhanced security and development (.gitignore)
* Added comprehensive rules for sensitive files (.env variants, credentials)
* Excluded API keys, runtime logs, test files, auto-generated skills directories
* Properly ignored ecosystem.config, build artifacts, package manager locks
- Add technical documentation (README_Tech.md)
* Process architecture (systemd Gateway, PM2 Dashboard, PM2 AI Product Visualizer)
* Management commands and troubleshooting guide
* Configuration summary and deployment checklist
* Problem log with 6 documented issues and solutions
Result:
- Bot now intelligently routes user requests to optimal models based on message type
- Production-ready with systemd isolation, preventing PM2 conflicts
- Comprehensive documentation for future maintenance and troubleshooting
- Secure version control with quality .gitignore
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
What:
- resolve shell from PATH in bash-tools tests (avoid /bin/bash dependency)
- mock DNS for web-fetch SSRF tests (no real network)
- stub a2ui bundle in canvas-host server test when missing
Why:
- keep gateway test suite deterministic on Nix/Garnix Linux
Tests:
- not run locally (known missing deps in unit test run)
What:
- stub resolvePinnedHostname in web-fetch tests to avoid DNS flake
- close lock file handles via FileHandle.close during cleanup to avoid EBADF
Why:
- make CI deterministic without network/DNS dependence
- prevent double-close errors from GC
Tests:
- pnpm vitest run --config vitest.unit.config.ts src/agents/tools/web-tools.fetch.test.ts src/agents/session-write-lock.test.ts (failed: missing @aws-sdk/client-bedrock)
* refactor(ui): enhance loadSessions function to accept overrides for session loading parameters
- Updated loadSessions to include optional parameters for activeMinutes, limit, includeGlobal, and includeUnknown.
- Modified refreshChat to use the new activeMinutes parameter when loading sessions.
- Removed duplicate applySettingsFromUrl call in handleConnected function.
* feat(ui): implement session refresh functionality after chat
- Added `refreshSessionsAfterChat` property to `ChatHost` and `GatewayHost` types.
- Introduced `isChatResetCommand` function to identify chat reset commands.
- Updated `handleSendChat` to set `refreshSessions` based on chat reset commands.
- Modified `handleGatewayEventUnsafe` to load sessions when chat is finalized and `refreshSessionsAfterChat` is true.
- Enhanced `refreshChat` to load sessions with `activeMinutes` set to 0 for immediate refresh.
NTFS does not allow < or > in filenames, causing the XML filename
escaping test to fail on Windows CI with ENOENT.
Replace file<test>.txt with file&test.txt — & is valid on all platforms
and still requires XML escaping (&), preserving the test's intent.
Fixes#3748
Previous fix only checked skippedEmpty > 0, but when model returns
content: [] no payloads are created at all. Now also checks
replies.length === 0 to catch this case.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When running multiple Telegram bot accounts bound to different agents,
the /new command (and other slash commands) would send confirmation
messages via the wrong bot because the context was missing AccountId.
The fix adds AccountId: route.accountId to the context payload in
registerTelegramNativeCommands, matching how bot-message-context.ts
handles regular messages.
Fixes#2537
- Add msg.video_note to media extraction chain in bot/delivery.ts
- Add placeholder detection for video notes in bot-message-context.ts
- Video notes (rounded square video messages) are now processed and downloaded like regular videos
Fixes issue where video note messages were silently dropped because they weren't in the media handling logic.
Native slash commands (e.g. /verbose, /status) should not emit tool
summaries. Gate onToolResult behind CommandSource !== 'native' in
addition to the existing ChatType !== 'group' check.
Add test for native command exclusion.
- provides onToolResult in DM sessions (ChatType=direct)
- does not provide onToolResult in group sessions (ChatType=group)
- sends tool results via dispatcher in DM sessions
Replaces the old cross-provider test that expected onToolResult to
always be undefined.
875b018ea removed onToolResult from dispatch-from-config.ts to prevent
tool summaries leaking into group channels. However, this also broke
verbose tool summaries in DM/private sessions where they are expected.
This restores onToolResult but gates it behind ChatType !== 'group',
so group channels remain unaffected while DM verbose works again.
mirror=false is passed to sendPayloadAsync to avoid duplicating tool
summaries in the session transcript (matching the block reply behavior).
Fixes#2665
Add a `paths` option to `memorySearch` config, allowing users to
explicitly specify additional directories or files to include in
memory search.
Follow-up to #2961 as suggested by @gumadeiras — instead of auto-following
symlinks (which has security implications), users can now explicitly
declare additional search paths.
- Add `memorySearch.paths` config option (array of strings)
- Paths can be absolute or relative (resolved from workspace)
- Directories are recursively scanned for `.md` files
- Single `.md` files can also be specified
- Paths from defaults and agent overrides are merged
- Added 4 test cases for listMemoryFiles
* fix: Prevent XML attribute injection by escaping special characters in file name and MIME type attributes.
* fix: text attachment MIME misclassification with security hardening (#3628)
- Fix CSV/TSV inference from content heuristics
- Add UTF-16 detection and BOM handling
- Add XML attribute escaping for file output (security)
- Add MIME override logging for auditability
- Add comprehensive test coverage for edge cases
Thanks @frankekn
Self messages from the linked WhatsApp number bypass dmPolicy and allowFrom
checks automatically. Clarified that users don't need to add their own
number to the allowlist.
Self messages from the linked WhatsApp number bypass dmPolicy checks
entirely (via isSamePhone check in access-control.ts)...
The pairing CLI calls listPairingChannels() at registration time,
which requires the plugin registry to be populated. Without this,
plugin-provided channels like Matrix fail with "does not support
pairing" even though they have pairing adapters defined.
This mirrors the existing pattern used by the plugins CLI entry.
Co-authored-by: Shakker <165377636+shakkernerd@users.noreply.github.com>
Add mappings for audio/x-m4a, audio/mp4, and video/quicktime to ensure
media files sent as documents are saved with proper extensions, enabling
automatic transcription/analysis tools to work correctly.
- audio/x-m4a → .m4a
- audio/mp4 → .m4a
- video/quicktime → .mov
Also adds comprehensive test coverage for extensionForMime().
Adds `messages` config option to session-memory hook (default: 15).
Fixes filter order bug - now filters user/assistant messages first,
then slices to get exactly N messages. Previously sliced first which
could result in fewer messages when non-message entries were present.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>