Commit Graph

305 Commits

Author SHA1 Message Date
valtterimelkko
dd7f826d0a Add PM2-native health monitoring and startup improvements
- Created scripts/gateway-start.sh: Startup wrapper that cleans stale lock files
  before starting the gateway (prevents "already running" errors)

- Created scripts/pm2-health-monitor.js: Standalone health check process managed by PM2
  * Monitors port 18789 connectivity every 5 minutes
  * Detects unresponsive gateway (process running but port hung)
  * Force-restarts via killall + PM2 auto-recovery
  * Monitors inotify watcher usage (warns at 80% of limit)
  * Logs to /tmp/moltbot/pm2-health-monitor.log

- Updated ecosystem.config.cjs to:
  * Use gateway-start.sh wrapper for lock cleanup
  * Add moltbot-health-monitor as separate PM2 app
  * Health monitor runs alongside gateway (same PM2 daemon, isolated from other daemons)

Key Design Principles:
- PM2 handles process lifecycle (restart, memory limits, crash recovery)
- Health monitor adds responsiveness detection (what PM2 can't do alone)
- No systemd involvement (prevents port conflicts with other PM2 instances)
- Each PM2 daemon isolated: moltbot-gateway, si_project/dashboard, ai_product_visualizer

This ensures gateway remains stable even if it becomes unresponsive to Telegram messages.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 20:03:06 +00:00
valtterimelkko
a37c9cad6d Fix: Remove systemd conflicts, clarify PM2-based process management
- Removed conflicting systemd service files (moltbot-gateway.service, moltbot-health-check.*)
- Removed redundant health-check script (PM2 handles restarts natively)
- Updated README_Tech.md to document PM2 as actual process manager
- Clarified that inotify fix (524288 limit) is permanent solution
- Documented PM2 commands for troubleshooting and monitoring
- Added safety note: Never use systemd for moltbot-gateway (causes port conflicts)
- Fixed architecture documentation to reflect PM2 daemon isolation model

Gateway now running cleanly via PM2 (PID 661291) without systemd interference.
Inotify limit verified at 524288 (prevents file watcher exhaustion).

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 19:51:16 +00:00
valtterimelkko
eec556c71e Fix: Resolve gateway crash loop and inotify exhaustion
Problem: Gateway was hung in 1200+ restart loop, causing Telegram bot to stop
responding. Root cause: system inotify file descriptor limit exhausted when
monitoring config/skill files.

Solutions implemented:

1. **Inotify limit increase** (/etc/sysctl.d/99-moltbot-inotify.conf)
   - Increased fs.inotify.max_user_watches from 65536 to 524288
   - Prevents "ENOSPC: System limit for number of file watchers reached"
   - Persistent across reboots

2. **Improved systemd service** (/etc/systemd/system/moltbot-gateway.service)
   - Changed Restart=always → Restart=on-failure
   - Increased RestartSec=5 → RestartSec=10 (reduce CPU churn)
   - Reduced StartLimitBurst=10 → StartLimitBurst=5
   - Added ExecStartPre to auto-clean stale locks on startup
   - Service remains isolated from other services (code-server, ssh, etc)

3. **Health check automation** (new files)
   - scripts/health-check-gateway.sh: detects hang/lock issues, auto-recovers
   - /etc/systemd/system/moltbot-health-check.service: runs health checks
   - /etc/systemd/system/moltbot-health-check.timer: runs every 5 minutes
   - Logs to /tmp/moltbot-health-check.log

4. **Documentation** (README_Tech.md)
   - Added section on crash loop root cause and preventative measures
   - Added Architecture section documenting service isolation
   - Updated troubleshooting with health check steps
   - Updated file locations with new monitoring files

Testing: Gateway now starts cleanly, health checks pass, other services
(code-server, ssh) remain unaffected. Timer runs every 5 minutes to prevent
future hangs.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-01-29 18:55:41 +00:00
Peter Steinberger
7eb57b691c chore: prep 2026.1.27-beta.1 release 2026-01-28 01:35:58 +01:00
Pooya Parsa
4a1b6bc008 update refs 2026-01-27 13:50:46 -08:00
Shadow
f7a0b0934d
Branding: update bot.molt bundle IDs + launchd labels 2026-01-27 14:46:50 -06:00
Shadow
cc72498b46
Mac: finish Moltbot rename 2026-01-27 14:12:17 -06:00
Peter Steinberger
640c8d1554 fix: ignore windows vitest worker crashes 2026-01-27 17:37:21 +00:00
Peter Steinberger
4a9c921168 fix: use threads pool for windows ci tests 2026-01-27 17:02:01 +00:00
Peter Steinberger
cf334d3b7d fix: shard windows ci test runs 2026-01-27 16:39:28 +00:00
Peter Steinberger
240232aed1 fix: run windows ci tests serially 2026-01-27 16:13:02 +00:00
Peter Steinberger
3817e0ce2c fix: bundle a2ui before tests 2026-01-27 15:38:31 +00:00
Peter Steinberger
e4518d2271 fix: allow docker builds to skip missing a2ui assets 2026-01-27 15:16:20 +00:00
Peter Steinberger
0594ccf92a fix: skip a2ui bundling when sources are excluded 2026-01-27 15:01:57 +00:00
Peter Steinberger
3015e11fd7 fix: stabilize install smoke against clawdbot installer 2026-01-27 14:58:01 +00:00
Peter Steinberger
6d16a658e5 refactor: rename clawdbot to moltbot with legacy compat 2026-01-27 12:21:02 +00:00
Peter Steinberger
83460df96f chore: update molt.bot domains 2026-01-27 12:21:01 +00:00
Gustavo Madeira Santana
2044b3ca8d Build: restore A2UI scaffold assets (#2455) (thanks @0oAstro)
Co-authored-by: 0oAstro <0oAstro@users.noreply.github.com>
2026-01-26 23:08:25 -05:00
Gustavo Madeira Santana
c2a4863b15 Build: stop tracking bundled artifacts (#2455) (thanks @0oAstro)
Co-authored-by: 0oAstro <0oAstro@users.noreply.github.com>
2026-01-26 23:08:25 -05:00
Peter Steinberger
fba7afaa12 chore(scripts): update claude auth status hints 2026-01-26 19:05:00 +00:00
Shadow
e040f6338a
Docs: update clawtributors list 2026-01-25 22:38:04 -06:00
Shadow
b25fcaef0f
CI: parse labeler without deps 2026-01-25 20:38:44 -06:00
Shadow
6b6284c69c
CI: add PR labeler + label sync 2026-01-25 20:37:31 -06:00
Peter Steinberger
71eb6d5dd0 fix(imessage): normalize messaging targets (#1708)
Co-authored-by: Aaron Ng <1653630+aaronn@users.noreply.github.com>
2026-01-25 13:43:32 +00:00
plum-dawg
c96ffa7186
feat: Add Line plugin (#1630)
* feat: add LINE plugin (#1630) (thanks @plum-dawg)

* feat: complete LINE plugin (#1630) (thanks @plum-dawg)

* chore: drop line plugin node_modules (#1630) (thanks @plum-dawg)

* test: mock /context report in commands test (#1630) (thanks @plum-dawg)

* test: limit macOS CI workers to avoid OOM (#1630) (thanks @plum-dawg)

* test: reduce macOS CI vitest workers (#1630) (thanks @plum-dawg)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-01-25 12:22:36 +00:00
Peter Steinberger
50f233d16d chore: stabilize prek hooks runner selection (#1720) (thanks @dguido) 2026-01-25 10:55:28 +00:00
Dan Guido
48aea87028
feat: add prek pre-commit hooks and dependabot (#1720)
* feat: add prek pre-commit hooks and dependabot

Pre-commit hooks (via prek):
- Basic hygiene: trailing-whitespace, end-of-file-fixer, check-yaml, check-added-large-files, check-merge-conflict
- Security: detect-secrets, zizmor (GitHub Actions audit)
- Linting: shellcheck, actionlint, oxlint, swiftlint
- Formatting: oxfmt, swiftformat

Dependabot:
- npm and GitHub Actions ecosystems
- Grouped updates (production/development/actions)
- 7-day cooldown for supply chain protection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add prek install instruction to AGENTS.md

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 10:53:23 +00:00
Rohan Nagpal
06a7e1e8ce
Telegram: threaded conversation support (#1597)
* Telegram: isolate dm topic sessions

* Tests: cap vitest workers

* Tests: cap Vitest workers on CI macOS

* Tests: avoid timer-based pi-ai stream mock

* Tests: increase embedded runner timeout

* fix: harden telegram dm thread handling (#1597) (thanks @rohannagpal)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-01-25 04:48:51 +00:00
Peter Steinberger
6d79c6cd26 fix: clean docker onboarding warnings + preserve agentId casing 2026-01-24 19:07:01 +00:00
Luke
be1cdc9370
fix(agents): treat provider request-aborted as timeout for fallback (#1576)
* fix(agents): treat request-aborted as timeout for fallback

* test(e2e): add provider timeout fallback
2026-01-24 11:27:24 +00:00
Peter Steinberger
4a9123d415 chore: suppress remaining deprecation warnings 2026-01-24 11:16:46 +00:00
Peter Steinberger
8b7b7e154f chore: speed up tests and update opencode models 2026-01-23 11:36:32 +00:00
Peter Steinberger
bb9bddebb4 fix: stabilize ci tests 2026-01-23 09:52:22 +00:00
Peter Steinberger
3de5ea818d ci: speed up install smoke on PRs 2026-01-23 09:05:15 +00:00
Peter Steinberger
86e0916fa3 fix: allow windows spawn in test parallel 2026-01-23 07:52:04 +00:00
Peter Steinberger
45ce07a098 test: split vitest into unit and gateway 2026-01-23 07:34:57 +00:00
Peter Steinberger
2c10c601a8 test: harden docker onboarding waits 2026-01-23 05:10:59 +00:00
Tak hoffman
b65916e0d1 CLI: fix Windows gateway startup 2026-01-23 04:47:01 +00:00
Peter Steinberger
7c336588ea chore: drop tty from install e2e docker 2026-01-22 23:09:28 +00:00
Peter Steinberger
573354f5e4 chore(dev): default restart-mac to attach-only 2026-01-22 23:08:56 +00:00
Peter Steinberger
8a20f44228 fix: improve gateway ssh auth handling 2026-01-22 06:54:08 +00:00
Peter Steinberger
50049fd220 chore(macos): drop time-sensitive notification entitlement toggle 2026-01-22 04:50:03 +00:00
Peter Steinberger
ff3d8cab2b feat: preflight update runner before rebase 2026-01-22 04:19:33 +00:00
Peter Steinberger
2e1514095d fix: package Textual resources for mac app 2026-01-22 02:34:27 +00:00
Clawd
429a2d7849 fix(mac): default to universal binary for distribution builds
Closes #1393

The distribution script (package-mac-dist.sh) now defaults BUILD_ARCHS to 'all',
producing universal binaries that run natively on both Apple Silicon and Intel Macs.

Previously, the script inherited the host architecture default from package-mac-app.sh,
which meant release builds done on ARM Macs only included ARM binaries.
2026-01-22 00:29:27 +00:00
Peter Steinberger
6c0a01dc90 fix: bundle mac model catalog 2026-01-21 19:58:19 +00:00
Peter Steinberger
fb47f1cbeb chore: rename clawlog references 2026-01-21 05:53:32 +00:00
Peter Steinberger
58b131919f feat: use tsgo for dev/watch builds 2026-01-21 04:06:09 +00:00
Peter Steinberger
aec622fe63 chore: remove fresh dist log 2026-01-21 03:13:50 +00:00
Peter Steinberger
dfbf6ac263 feat: enforce device-bound connect challenge 2026-01-20 13:04:19 +00:00