openclaw

romtuck/openclaw

Fork 0

Commit Graph

Author	SHA1	Message	Date
valtterimelkko	eec556c71e	Fix: Resolve gateway crash loop and inotify exhaustion Problem: Gateway was hung in 1200+ restart loop, causing Telegram bot to stop responding. Root cause: system inotify file descriptor limit exhausted when monitoring config/skill files. Solutions implemented: 1. Inotify limit increase (/etc/sysctl.d/99-moltbot-inotify.conf) - Increased fs.inotify.max_user_watches from 65536 to 524288 - Prevents "ENOSPC: System limit for number of file watchers reached" - Persistent across reboots 2. Improved systemd service (/etc/systemd/system/moltbot-gateway.service) - Changed Restart=always → Restart=on-failure - Increased RestartSec=5 → RestartSec=10 (reduce CPU churn) - Reduced StartLimitBurst=10 → StartLimitBurst=5 - Added ExecStartPre to auto-clean stale locks on startup - Service remains isolated from other services (code-server, ssh, etc) 3. Health check automation (new files) - scripts/health-check-gateway.sh: detects hang/lock issues, auto-recovers - /etc/systemd/system/moltbot-health-check.service: runs health checks - /etc/systemd/system/moltbot-health-check.timer: runs every 5 minutes - Logs to /tmp/moltbot-health-check.log 4. Documentation (README_Tech.md) - Added section on crash loop root cause and preventative measures - Added Architecture section documenting service isolation - Updated troubleshooting with health check steps - Updated file locations with new monitoring files Testing: Gateway now starts cleanly, health checks pass, other services (code-server, ssh) remain unaffected. Timer runs every 5 minutes to prevent future hangs. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-29 18:55:41 +00:00
Valtteri Melkko	ab8540870b	Implement task-type router with intelligent model selection and production setup Major Changes: - Implement task-type router (src/agents/task-type-router.ts) for intelligent model routing * Detects task type from user message (file-analysis, creative, debugging, cli, general) * Routes to optimal models: Gemini Flash (file analysis), Llama 3.3 70B (creative), Claude Sonnet 4.5 (debugging), Mistral Devstral 2 (CLI/general) * Integrated into model selection pipeline for seamless routing - Integrate task-type routing into model resolution (src/agents/model-selection.ts) * Pass userMessage to resolveDefaultModelForAgent for context-aware routing * Maintain fallback chain for model availability - Update attempt runner (src/agents/pi-embedded-runner/run/attempt.ts) * Pass prompt context to enable task-type based model selection - Enhanced security and development (.gitignore) * Added comprehensive rules for sensitive files (.env variants, credentials) * Excluded API keys, runtime logs, test files, auto-generated skills directories * Properly ignored ecosystem.config, build artifacts, package manager locks - Add technical documentation (README_Tech.md) * Process architecture (systemd Gateway, PM2 Dashboard, PM2 AI Product Visualizer) * Management commands and troubleshooting guide * Configuration summary and deployment checklist * Problem log with 6 documented issues and solutions Result: - Bot now intelligently routes user requests to optimal models based on message type - Production-ready with systemd isolation, preventing PM2 conflicts - Comprehensive documentation for future maintenance and troubleshooting - Secure version control with quality .gitignore Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-01-29 15:27:12 +00:00

Author

SHA1

Message

Date

valtterimelkko

eec556c71e

Fix: Resolve gateway crash loop and inotify exhaustion

Problem: Gateway was hung in 1200+ restart loop, causing Telegram bot to stop
responding. Root cause: system inotify file descriptor limit exhausted when
monitoring config/skill files.

Solutions implemented:

1. **Inotify limit increase** (/etc/sysctl.d/99-moltbot-inotify.conf)
   - Increased fs.inotify.max_user_watches from 65536 to 524288
   - Prevents "ENOSPC: System limit for number of file watchers reached"
   - Persistent across reboots

2. **Improved systemd service** (/etc/systemd/system/moltbot-gateway.service)
   - Changed Restart=always → Restart=on-failure
   - Increased RestartSec=5 → RestartSec=10 (reduce CPU churn)
   - Reduced StartLimitBurst=10 → StartLimitBurst=5
   - Added ExecStartPre to auto-clean stale locks on startup
   - Service remains isolated from other services (code-server, ssh, etc)

3. **Health check automation** (new files)
   - scripts/health-check-gateway.sh: detects hang/lock issues, auto-recovers
   - /etc/systemd/system/moltbot-health-check.service: runs health checks
   - /etc/systemd/system/moltbot-health-check.timer: runs every 5 minutes
   - Logs to /tmp/moltbot-health-check.log

4. **Documentation** (README_Tech.md)
   - Added section on crash loop root cause and preventative measures
   - Added Architecture section documenting service isolation
   - Updated troubleshooting with health check steps
   - Updated file locations with new monitoring files

Testing: Gateway now starts cleanly, health checks pass, other services
(code-server, ssh) remain unaffected. Timer runs every 5 minutes to prevent
future hangs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 18:55:41 +00:00

Valtteri Melkko

ab8540870b

Implement task-type router with intelligent model selection and production setup

Major Changes:
- Implement task-type router (src/agents/task-type-router.ts) for intelligent model routing
  * Detects task type from user message (file-analysis, creative, debugging, cli, general)
  * Routes to optimal models: Gemini Flash (file analysis), Llama 3.3 70B (creative),
    Claude Sonnet 4.5 (debugging), Mistral Devstral 2 (CLI/general)
  * Integrated into model selection pipeline for seamless routing

- Integrate task-type routing into model resolution (src/agents/model-selection.ts)
  * Pass userMessage to resolveDefaultModelForAgent for context-aware routing
  * Maintain fallback chain for model availability

- Update attempt runner (src/agents/pi-embedded-runner/run/attempt.ts)
  * Pass prompt context to enable task-type based model selection

- Enhanced security and development (.gitignore)
  * Added comprehensive rules for sensitive files (.env variants, credentials)
  * Excluded API keys, runtime logs, test files, auto-generated skills directories
  * Properly ignored ecosystem.config, build artifacts, package manager locks

- Add technical documentation (README_Tech.md)
  * Process architecture (systemd Gateway, PM2 Dashboard, PM2 AI Product Visualizer)
  * Management commands and troubleshooting guide
  * Configuration summary and deployment checklist
  * Problem log with 6 documented issues and solutions

Result:
- Bot now intelligently routes user requests to optimal models based on message type
- Production-ready with systemd isolation, preventing PM2 conflicts
- Comprehensive documentation for future maintenance and troubleshooting
- Secure version control with quality .gitignore

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-01-29 15:27:12 +00:00

2 Commits