openclaw

romtuck/openclaw

Fork 0

Commit Graph

Author	SHA1	Message	Date
valtterimelkko	eec556c71e	Fix: Resolve gateway crash loop and inotify exhaustion Problem: Gateway was hung in 1200+ restart loop, causing Telegram bot to stop responding. Root cause: system inotify file descriptor limit exhausted when monitoring config/skill files. Solutions implemented: 1. Inotify limit increase (/etc/sysctl.d/99-moltbot-inotify.conf) - Increased fs.inotify.max_user_watches from 65536 to 524288 - Prevents "ENOSPC: System limit for number of file watchers reached" - Persistent across reboots 2. Improved systemd service (/etc/systemd/system/moltbot-gateway.service) - Changed Restart=always → Restart=on-failure - Increased RestartSec=5 → RestartSec=10 (reduce CPU churn) - Reduced StartLimitBurst=10 → StartLimitBurst=5 - Added ExecStartPre to auto-clean stale locks on startup - Service remains isolated from other services (code-server, ssh, etc) 3. Health check automation (new files) - scripts/health-check-gateway.sh: detects hang/lock issues, auto-recovers - /etc/systemd/system/moltbot-health-check.service: runs health checks - /etc/systemd/system/moltbot-health-check.timer: runs every 5 minutes - Logs to /tmp/moltbot-health-check.log 4. Documentation (README_Tech.md) - Added section on crash loop root cause and preventative measures - Added Architecture section documenting service isolation - Updated troubleshooting with health check steps - Updated file locations with new monitoring files Testing: Gateway now starts cleanly, health checks pass, other services (code-server, ssh) remain unaffected. Timer runs every 5 minutes to prevent future hangs. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-29 18:55:41 +00:00

Author

SHA1

Message

Date

valtterimelkko

eec556c71e

Fix: Resolve gateway crash loop and inotify exhaustion

Problem: Gateway was hung in 1200+ restart loop, causing Telegram bot to stop
responding. Root cause: system inotify file descriptor limit exhausted when
monitoring config/skill files.

Solutions implemented:

1. **Inotify limit increase** (/etc/sysctl.d/99-moltbot-inotify.conf)
   - Increased fs.inotify.max_user_watches from 65536 to 524288
   - Prevents "ENOSPC: System limit for number of file watchers reached"
   - Persistent across reboots

2. **Improved systemd service** (/etc/systemd/system/moltbot-gateway.service)
   - Changed Restart=always → Restart=on-failure
   - Increased RestartSec=5 → RestartSec=10 (reduce CPU churn)
   - Reduced StartLimitBurst=10 → StartLimitBurst=5
   - Added ExecStartPre to auto-clean stale locks on startup
   - Service remains isolated from other services (code-server, ssh, etc)

3. **Health check automation** (new files)
   - scripts/health-check-gateway.sh: detects hang/lock issues, auto-recovers
   - /etc/systemd/system/moltbot-health-check.service: runs health checks
   - /etc/systemd/system/moltbot-health-check.timer: runs every 5 minutes
   - Logs to /tmp/moltbot-health-check.log

4. **Documentation** (README_Tech.md)
   - Added section on crash loop root cause and preventative measures
   - Added Architecture section documenting service isolation
   - Updated troubleshooting with health check steps
   - Updated file locations with new monitoring files

Testing: Gateway now starts cleanly, health checks pass, other services
(code-server, ssh) remain unaffected. Timer runs every 5 minutes to prevent
future hangs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 18:55:41 +00:00

1 Commits