Problem: Gateway was hung in 1200+ restart loop, causing Telegram bot to stop responding. Root cause: system inotify file descriptor limit exhausted when monitoring config/skill files. Solutions implemented: 1. **Inotify limit increase** (/etc/sysctl.d/99-moltbot-inotify.conf) - Increased fs.inotify.max_user_watches from 65536 to 524288 - Prevents "ENOSPC: System limit for number of file watchers reached" - Persistent across reboots 2. **Improved systemd service** (/etc/systemd/system/moltbot-gateway.service) - Changed Restart=always → Restart=on-failure - Increased RestartSec=5 → RestartSec=10 (reduce CPU churn) - Reduced StartLimitBurst=10 → StartLimitBurst=5 - Added ExecStartPre to auto-clean stale locks on startup - Service remains isolated from other services (code-server, ssh, etc) 3. **Health check automation** (new files) - scripts/health-check-gateway.sh: detects hang/lock issues, auto-recovers - /etc/systemd/system/moltbot-health-check.service: runs health checks - /etc/systemd/system/moltbot-health-check.timer: runs every 5 minutes - Logs to /tmp/moltbot-health-check.log 4. **Documentation** (README_Tech.md) - Added section on crash loop root cause and preventative measures - Added Architecture section documenting service isolation - Updated troubleshooting with health check steps - Updated file locations with new monitoring files Testing: Gateway now starts cleanly, health checks pass, other services (code-server, ssh) remain unaffected. Timer runs every 5 minutes to prevent future hangs. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
13 KiB
Moltbot Technical Documentation
Format Guidelines for Contributors
Style: Concise, technical, action-oriented.
Brevity: One sentence per command/concept. Use bullet points, not paragraphs.
Problem Log: Keep entries short—problem → symptom → solution. Add date and who fixed it if known.
Commands: Always include the command first, explanation after (e.g., systemctl restart moltbot-gateway # Restarts the gateway service).
Sections: Group by topic. Use ## for major sections, ### for subsections.
Updates: When adding new problems/solutions, add to the end of the Problem Log section with date.
Process Architecture
Core Components
-
Moltbot Gateway (
moltbot-gateway)- Service:
/etc/systemd/system/moltbot-gateway.service - Runs:
/usr/bin/node dist/entry.js gateway --port 18789 - Manager:
systemd(isolated from PM2) - Handles: Telegram integration, message routing, model selection
- Service:
-
Supporting Processes
- Dashboard (si_project/dashboard) - PM2 managed, separate from bot
- AI Product Visualizer (ai_product_visualizer) - PM2 managed, separate from bot
- Telegram Relay - Embedded in gateway (grammY framework)
- Task-Type Router - Compiled TypeScript module in gateway
-
Configuration Files
- Global:
/root/.clawdbot/moltbot.json - Agent-specific:
/root/.clawdbot/agents/main/config.json - Environment:
/root/.clawdbot/.env
- Global:
Process Management
Moltbot Gateway (Systemd)
# Check status
systemctl status moltbot-gateway
# Restart (reloads config + code)
systemctl restart moltbot-gateway
# Stop gracefully
systemctl stop moltbot-gateway
# Start if stopped
systemctl start moltbot-gateway
# View live logs
journalctl -u moltbot-gateway -f
# View last 100 lines
journalctl -u moltbot-gateway -n 100
Auto-restart: Enabled. If process crashes, systemd restarts it within 5 seconds. Boot persistence: Enabled. Starts automatically on system reboot.
From Telegram Chat
Send /restart command in Telegram to restart the bot gracefully without terminal access.
Dashboard (PM2)
# Check status
pm2 list
# Restart
pm2 restart dashboard
# Logs
pm2 logs dashboard
# Stop
pm2 stop dashboard
Isolation: Runs in separate PM2 daemon. Does not interfere with Moltbot.
Logs Location
# Moltbot systemd logs
journalctl -u moltbot-gateway -n 200
# Moltbot app logs (most detailed)
tail -f /var/log/moltbot-gateway.log
# Application debug logs
tail -f /tmp/moltbot/moltbot-*.log
Problem Log & Solutions
1. Duplicate Telegram Responses (Jan 28, 2026)
Problem: Bot sending same message 2-3 times.
Root Cause: streamMode: "partial" in Telegram config caused responses to stream as chunks, each sent separately.
Solution: Changed streamMode from "partial" to "block" in /root/.clawdbot/moltbot.json.
"telegram": {
"streamMode": "block" // Single unified message
}
Status: ✅ Fixed. Single responses now.
2. Unknown Model Error (Jan 28, 2026)
Problem: Error: Unknown model: openrouter/mistralai/mistral-devstral-2
Root Cause: Incorrect OpenRouter model ID format. Used old naming convention.
Solution: Updated model IDs to correct OpenRouter format:
mistralai/devstral-2512(Mistral Devstral 2)google/gemini-2.0-flash-001(Gemini 2.0 Flash)meta-llama/llama-3.3-70b-instruct:free(Llama 3.3 70B)
Status: ✅ Fixed. Models now load correctly.
3. PM2 Process Isolation Conflict (Jan 28, 2026)
Problem: Dashboard PM2 restarting 140+ times. Gateway conflicting with dashboard in same PM2 daemon.
Root Cause: Moltbot gateway was added to default PM2 instance, sharing resources with dashboard.
Solution: Moved Moltbot from PM2 to systemd service (isolated).
- Moltbot:
systemdonly - Dashboard:
PM2only - No shared daemon = no conflicts
Status: ✅ Fixed. Processes now isolated.
Files changed:
- Created:
/etc/systemd/system/moltbot-gateway.service - Removed: Moltbot from PM2 list
4. Missing Task-Type Router Compilation (Jan 28, 2026)
Problem: Bot said it implemented task-type routing but nothing changed.
Root Cause: TypeScript source files modified but not compiled to dist/.
Solution:
- Fixed import error in
src/agents/task-type-router.ts(DEFAULT_PROVIDER location) - Compiled:
npm run build - Restarted gateway to load new
dist/code
Status: ✅ Fixed. Task-type router now active.
5. Telegram Command Limit Exceeded (Jan 29, 2026)
Problem: Error: setMyCommands failed: BOT_COMMANDS_TOO_MUCH (Telegram API limit = 100 commands).
Root Cause: Both config files had "native": "auto" trying to register all skills + commands with Telegram.
Solution: Disabled native command auto-registration:
// /root/.clawdbot/moltbot.json
"commands": {
"native": false,
"nativeSkills": false
}
// /root/.clawdbot/agents/main/config.json
"commands": {
"native": false,
"text": true,
"restart": true
}
Status: ✅ Fixed. Telegram now connects without errors.
6. Node.js Version Too Old (Jan 28, 2026)
Problem: Moltbot requires Node.js 24+ but only v20 was installed.
Root Cause: Package.json specified engines: { node: ">=24" }.
Solution: Upgraded Node.js:
curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash -
sudo apt-get install -y nodejs
Verified: node --version → v24.13.0
Status: ✅ Fixed.
7. Gateway Crash Loop & Inotify Exhaustion (Jan 29, 2026)
Problem: Gateway hung/became unresponsive. Systemd crashed 1203+ times. Telegram bot stopped responding.
Symptoms:
Port 18789 is already in use(but port handler didn't properly clean up)Gateway failed to start: gateway already running (pid 618450); lock timeout after 5000ms- Lock files stale/not released
Root Cause: System hit inotify file descriptor limit (ENOSPC):
Error: ENOSPC: System limit for number of file watchers reached, watch '/root/.moltbot/moltbot.json'
Error: ENOSPC: System limit for number of file watchers reached, watch '/root/clawd/canvas'
Error: ENOSPC: System limit for number of file watchers reached, watch '/root/clawd'
Gateway couldn't monitor config/skill files for changes → config reloading broke → became hung/unresponsive → systemd restart loop.
Solutions:
- Immediate fix: Kill stuck process + clean lock files
kill -9 618450
rm -f ~/.clawdbot/gateway.lock ~/.clawdbot/moltbot.lock
systemctl restart moltbot-gateway
- Permanent inotify limit increase:
/etc/sysctl.d/99-moltbot-inotify.conf
fs.inotify.max_user_watches=524288 # Increased from 65536
Apply: sysctl -p /etc/sysctl.d/99-moltbot-inotify.conf
-
Better systemd service:
/etc/systemd/system/moltbot-gateway.service- Changed
Restart=always→Restart=on-failure(only restart on actual failure) - Increased
RestartSec=5→RestartSec=10(reduce CPU churn) - Reduced
StartLimitBurst=10→StartLimitBurst=5(fewer restart attempts before blocking) - Added
ExecStartPreto auto-clean stale locks on startup
- Changed
-
Health check monitoring:
/etc/systemd/system/moltbot-health-check.{service,timer}- Runs
/root/moltbot/scripts/health-check-gateway.shevery 5 minutes - Checks if gateway is responding on port 18789
- Detects stale lock files and crash loops
- Automatically cleans locks and restarts if needed
- Isolated: Does not interfere with other services (code-server, ssh, etc.)
- Runs
Key Files Added/Modified:
- Created:
scripts/health-check-gateway.sh(health check logic) - Created:
/etc/systemd/system/moltbot-health-check.service - Created:
/etc/systemd/system/moltbot-health-check.timer - Created:
/etc/sysctl.d/99-moltbot-inotify.conf - Modified:
/etc/systemd/system/moltbot-gateway.service(restart policy)
Status: ✅ Fixed. All preventative measures in place.
Configuration Summary
Model Fallback Chain
Primary: Mistral Devstral 2 2512 (agentic specialist) Fallbacks:
- Gemini 2.0 Flash (long-context, 1M tokens)
- Llama 3.3 70B (creative/pedagogical)
- Moonshot Kimi K2.5 (language model)
- Claude Sonnet 4.5 (escalation)
- Claude Opus 4.5 (complex reasoning)
Task-Type Routing
- File Analysis → Gemini Flash
- Creative Content → Llama 3.3 70B
- Debugging → Claude Sonnet 4.5
- CLI/Commands → Mistral Devstral 2
- General → Mistral Devstral 2 (default)
Telegram Settings
- Streaming Mode:
block(single message per response) - Commands Native:
false(avoid API limit) - Restart Command:
true(allows/restartfrom chat) - User ID Allowlist: 876311493 (only you)
Architecture: Service Isolation & Stability
Systemd Services Running on This Host
moltbot-gateway.service- Telegram bot gateway (isolated, does not affect others)moltbot-health-check.timer- Periodic gateway health monitoring (oneshot service, no resource hoarding)code-server.service- Code editor (independent, unaffected)ssh.service- SSH server (independent, unaffected)
Safety Design
- No shared resources: Each service runs independently
- No resource limits affecting others: Moltbot has
LimitNOFILE/NOPROCset locally only - Health check is isolated: Runs as
oneshot(completes quickly), doesn't run concurrently with gateway - No interference with startup/shutdown: Services can be restarted independently
Monitoring
- Automatic health checks: Every 5 minutes (can be adjusted in
moltbot-health-check.timer) - Logs:
/tmp/moltbot-health-check.log(separate from gateway logs) - Manual check:
systemctl list-timers moltbot-health-check.timer
Quick Troubleshooting
Bot Not Responding
- Check status:
systemctl status moltbot-gateway - Check logs:
journalctl -u moltbot-gateway -n 50 - Check health:
bash /root/moltbot/scripts/health-check-gateway.sh - Restart:
systemctl restart moltbot-gateway - Check inotify limit (if file watching errors):
cat /proc/sys/fs/inotify/max_user_watches
Telegram Connection Error
Check logs for setMyCommands failed or network errors.
If command limit error: Verify native: false in both config files.
High Latency (>1 minute)
Expected for first API call to OpenRouter. Check OpenRouter API status.
If consistent, check model health: node dist/entry.js models status
Duplicate Responses
Check streamMode: "block" is set in /root/.clawdbot/moltbot.json.
If issue persists, reduce retry attempts in retry policy config.
Deployment Checklist
- Node.js 24+ installed
- Moltbot cloned and built (
npm run build) - Systemd service created and enabled
- Config files populated (moltbot.json, agents/main/config.json)
- API keys in environment or .env
- Telegram bot token configured
- Gateway started:
systemctl start moltbot-gateway - Telegram connection verified:
node dist/entry.js channels status - Test message sent in Telegram
Key File Locations
/root/moltbot/ Main installation
├── dist/ Compiled code (loaded at runtime)
├── src/ TypeScript source
├── scripts/
│ └── health-check-gateway.sh Health monitoring script
├── ecosystem.config.cjs PM2 config (legacy, not used)
└── README_Tech.md This file
~/.clawdbot/ Config directory
├── moltbot.json Global gateway config
├── agents/main/
│ ├── config.json Agent-specific config
│ └── auth-profiles.json API key storage
└── .env Environment variables
/etc/systemd/system/ System services
├── moltbot-gateway.service Systemd service (gateway)
├── moltbot-health-check.service Health check service
├── moltbot-health-check.timer Health check timer (runs every 5min)
└── ...other services (code-server, ssh, etc)
/etc/sysctl.d/ System configuration
└── 99-moltbot-inotify.conf Inotify limit config
/var/log/ System logs
└── moltbot-gateway.log Gateway application log
/tmp/moltbot/ Runtime logs
├── moltbot-*.log Detailed debug logs
└── moltbot-health-check.log Health check results
Last Updated: Jan 29, 2026 (18:50 UTC) Maintained By: Claude Code + Moltbot Task Router Latest: Crash loop root cause fixed (inotify limit), health monitoring added, service isolation verified