Commit Graph

2 Commits

Author SHA1 Message Date
valtterimelkko
eec556c71e Fix: Resolve gateway crash loop and inotify exhaustion
Problem: Gateway was hung in 1200+ restart loop, causing Telegram bot to stop
responding. Root cause: system inotify file descriptor limit exhausted when
monitoring config/skill files.

Solutions implemented:

1. **Inotify limit increase** (/etc/sysctl.d/99-moltbot-inotify.conf)
   - Increased fs.inotify.max_user_watches from 65536 to 524288
   - Prevents "ENOSPC: System limit for number of file watchers reached"
   - Persistent across reboots

2. **Improved systemd service** (/etc/systemd/system/moltbot-gateway.service)
   - Changed Restart=always → Restart=on-failure
   - Increased RestartSec=5 → RestartSec=10 (reduce CPU churn)
   - Reduced StartLimitBurst=10 → StartLimitBurst=5
   - Added ExecStartPre to auto-clean stale locks on startup
   - Service remains isolated from other services (code-server, ssh, etc)

3. **Health check automation** (new files)
   - scripts/health-check-gateway.sh: detects hang/lock issues, auto-recovers
   - /etc/systemd/system/moltbot-health-check.service: runs health checks
   - /etc/systemd/system/moltbot-health-check.timer: runs every 5 minutes
   - Logs to /tmp/moltbot-health-check.log

4. **Documentation** (README_Tech.md)
   - Added section on crash loop root cause and preventative measures
   - Added Architecture section documenting service isolation
   - Updated troubleshooting with health check steps
   - Updated file locations with new monitoring files

Testing: Gateway now starts cleanly, health checks pass, other services
(code-server, ssh) remain unaffected. Timer runs every 5 minutes to prevent
future hangs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 18:55:41 +00:00
Valtteri Melkko
ab8540870b Implement task-type router with intelligent model selection and production setup
Major Changes:
- Implement task-type router (src/agents/task-type-router.ts) for intelligent model routing
  * Detects task type from user message (file-analysis, creative, debugging, cli, general)
  * Routes to optimal models: Gemini Flash (file analysis), Llama 3.3 70B (creative),
    Claude Sonnet 4.5 (debugging), Mistral Devstral 2 (CLI/general)
  * Integrated into model selection pipeline for seamless routing

- Integrate task-type routing into model resolution (src/agents/model-selection.ts)
  * Pass userMessage to resolveDefaultModelForAgent for context-aware routing
  * Maintain fallback chain for model availability

- Update attempt runner (src/agents/pi-embedded-runner/run/attempt.ts)
  * Pass prompt context to enable task-type based model selection

- Enhanced security and development (.gitignore)
  * Added comprehensive rules for sensitive files (.env variants, credentials)
  * Excluded API keys, runtime logs, test files, auto-generated skills directories
  * Properly ignored ecosystem.config, build artifacts, package manager locks

- Add technical documentation (README_Tech.md)
  * Process architecture (systemd Gateway, PM2 Dashboard, PM2 AI Product Visualizer)
  * Management commands and troubleshooting guide
  * Configuration summary and deployment checklist
  * Problem log with 6 documented issues and solutions

Result:
- Bot now intelligently routes user requests to optimal models based on message type
- Production-ready with systemd isolation, preventing PM2 conflicts
- Comprehensive documentation for future maintenance and troubleshooting
- Secure version control with quality .gitignore

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 15:27:12 +00:00