Docs: Add detailed user experience vs technical analysis for current issue
**Added comprehensive section showing:** - User perspective: What you see when bot doesn't respond (typing indicator appears, then nothing) - Technical reality: Exact timeline from logs showing message processing interrupted mid-tool-execution - Visual timeline with precise timestamps (21:20:35 → 21:21:02) - Actual log evidence from moltbot and PM2 logs - Root problem chain: config rewrite → file watcher → reload handler → SIGUSR1 → shutdown - The blocker: config file being automatically restored (unknown mechanism creating reload cycle) - Verification table ruling out other causes **Timeline captured:** - T+0: Message received (21:20:35) - T+1: Agent processing starts (21:20:36) - T+14: First tool completes (21:20:49) - T+27: Second tool starts (21:21:02) - T+27.007: SIGUSR1 signal received - gateway self-terminates - T+31: Gateway restarts with new PID (21:21:06) **Key finding:** This is NOT a crash, it's a controlled graceful shutdown triggered by SIGUSR1 during message processing due to config file rewrites. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
8035327cf3
commit
5398b85a58
163
README_Tech.md
163
README_Tech.md
@ -480,6 +480,169 @@ This forced PM2 to kill the stuck process and start fresh.
|
||||
|
||||
---
|
||||
|
||||
## Understanding the Current Issue: User Experience vs. Technical Reality (Jan 29, 2026 Evening)
|
||||
|
||||
### What You Experience (User Perspective)
|
||||
|
||||
**The Sequence:**
|
||||
1. **Send Message**: You type "hi" and send to the Telegram bot
|
||||
2. **Typing Indicator Appears**: Bot shows "writing" / typing indicator in Telegram chat
|
||||
3. **Long Wait**: 20-30 seconds pass with no response
|
||||
4. **No Response Arrives**: Bot never sends a reply message
|
||||
5. **Chat Shows Nothing**: Only the typing indicator appeared, then disappeared
|
||||
6. **Consistent Pattern**: Same thing happens with every message you send
|
||||
|
||||
**From Your View:**
|
||||
- "Bot is receiving my message (typing shows) but never responds"
|
||||
- "It crashes/fails to send every time"
|
||||
- "This is 100% reproducible - same thing happens each time"
|
||||
|
||||
---
|
||||
|
||||
### What Actually Happens (Technical Analysis from Logs)
|
||||
|
||||
**Timeline - Exact Seconds:**
|
||||
|
||||
```
|
||||
T+0 sec (21:20:35) ✉️ TELEGRAM MESSAGE RECEIVED
|
||||
Your message arrives at gateway
|
||||
|
||||
↓
|
||||
|
||||
T+1 sec (21:20:36) 🔄 AGENT PROCESSING STARTS
|
||||
runId=b27eccb8-f113-4f67-8de6-45a1161f6c97
|
||||
model=openrouter/mistralai/devstral-2512
|
||||
- Agent receives message
|
||||
- Starts generating response
|
||||
- Begins executing tools
|
||||
|
||||
↓
|
||||
|
||||
T+14 sec (21:20:49) ✅ FIRST TOOL COMPLETES
|
||||
tool=exec toolCallId=UG6gBMdvD
|
||||
- Tool output processed successfully
|
||||
- Agent continues to next step
|
||||
- Still generating response
|
||||
|
||||
↓
|
||||
|
||||
T+27 sec (21:21:02) 🔧 SECOND TOOL STARTS
|
||||
tool=exec toolCallId=SvLFDzozr
|
||||
- Second tool execution begins
|
||||
- Agent mid-processing
|
||||
|
||||
↓
|
||||
|
||||
T+27.007 sec (21:21:02 +7ms) ⚠️ SIGUSR1 SIGNAL RECEIVED
|
||||
source: src/gateway/server-reload-handlers.ts:157
|
||||
- Config file watcher detected change
|
||||
- Reload handler evaluated: "restart needed"
|
||||
- Gateway emits: process.emit("SIGUSR1")
|
||||
- Shutdown signal sent to itself
|
||||
|
||||
↓
|
||||
|
||||
T+27 sec (21:21:02) 🛑 GRACEFUL SHUTDOWN BEGINS
|
||||
- In-flight message processing INTERRUPTED
|
||||
- Tool execution aborted mid-operation
|
||||
- Response generation STOPS
|
||||
- Agent run never completes
|
||||
|
||||
↓
|
||||
|
||||
T+31 sec (21:21:06) 🔄 GATEWAY FULLY RESTARTED
|
||||
- New process spawned
|
||||
- Old PID 702504 → New PID 703669
|
||||
- Telegram client reconnects
|
||||
- Ready for next message
|
||||
- Previous message result LOST
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Log Evidence (From Actual Logs)
|
||||
|
||||
**From `/tmp/moltbot/moltbot-2026-01-29.log`:**
|
||||
```json
|
||||
{"date":"2026-01-29T21:20:36.078Z"} embedded run start: runId=b27eccb8-f113-4f67-8de6-45a1161f6c97 provider=openrouter model=mistralai/devstral-2512
|
||||
{"date":"2026-01-29T21:20:49.339Z"} embedded run tool start: tool=exec toolCallId=UG6gBMdvD
|
||||
{"date":"2026-01-29T21:20:49.380Z"} embedded run tool end: tool=exec toolCallId=UG6gBMdvD ✅ Completed
|
||||
{"date":"2026-01-29T21:21:02.672Z"} embedded run tool start: tool=exec toolCallId=SvLFDzozr
|
||||
{"date":"2026-01-29T21:21:02.679Z"} signal SIGTERM received ⚠️ INTERRUPTED
|
||||
{"date":"2026-01-29T21:21:02.681Z"} received SIGTERM; shutting down
|
||||
{"date":"2026-01-29T21:21:02.689Z"} embedded run tool end: tool=exec toolCallId=SvLFDzozr ⚠️ Never completed
|
||||
```
|
||||
|
||||
**From `/tmp/moltbot/pm2-out.log`:**
|
||||
```
|
||||
20:32:34: [reload] config change requires gateway restart (plugins.entries.telegram.enabled)
|
||||
21:01:43: [reload] config change requires gateway restart (plugins.entries.telegram)
|
||||
21:14:26: [reload] config change detected; evaluating reload
|
||||
21:14:26: [gateway] signal SIGUSR1 received
|
||||
21:14:26: [gateway] received SIGUSR1; restarting
|
||||
21:14:26: [gmail-watcher] gmail watcher stopped
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### The Root Problem Chain
|
||||
|
||||
```
|
||||
CONFIG FILE AUTOMATICALLY REWRITTEN
|
||||
↓
|
||||
FILE WATCHER DETECTS CHANGE
|
||||
↓
|
||||
RELOAD HANDLER EVALUATES CHANGE
|
||||
↓
|
||||
DECIDES: "Full restart required"
|
||||
↓
|
||||
GATEWAY EMITS: process.emit("SIGUSR1")
|
||||
↓
|
||||
SIGNAL HANDLER TRIGGERS
|
||||
↓
|
||||
GRACEFUL SHUTDOWN SEQUENCE STARTS
|
||||
↓
|
||||
IF MESSAGE PROCESSING ACTIVE:
|
||||
→ In-flight request INTERRUPTED
|
||||
→ Incomplete response DISCARDED
|
||||
→ User sees: "bot typing but never responds"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Why Config File Keeps Rewriting (The Blocker)
|
||||
|
||||
**The Cycle:**
|
||||
1. User/agent modifies config file
|
||||
2. Gateway file watcher detects the change
|
||||
3. Reload handler examines what changed
|
||||
4. Decides plugin/model changes require full restart
|
||||
5. Emits SIGUSR1 → gateway shuts down gracefully
|
||||
6. Gateway restarts
|
||||
7. **Config file is automatically restored to previous state** ← Unknown mechanism
|
||||
8. File watcher detects "change" again (restore triggers it)
|
||||
9. Back to step 2 → **LOOP CONTINUES**
|
||||
|
||||
This creates a **reload cycle** where every config modification triggers a restart.
|
||||
|
||||
**If your message arrives during one of these cycles**, the agent processing gets interrupted mid-tool-execution, and no response is sent.
|
||||
|
||||
---
|
||||
|
||||
### Why It's NOT Other Things (Verified)
|
||||
|
||||
| What | Status | Why Not |
|
||||
|------|--------|---------|
|
||||
| Bot crash | ❌ No | Gateway shuts down gracefully, restarts cleanly |
|
||||
| PM2 killing it | ❌ No | PM2 not involved in SIGUSR1 signal |
|
||||
| Health monitor | ❌ No | Uses SIGKILL (-9), not SIGTERM; checks every 5 minutes |
|
||||
| Memory exhaustion | ❌ No | 52MB usage, 500MB limit |
|
||||
| Plugin overflow | ❌ No | Logs show no errors, same config worked this morning |
|
||||
| Inotify exhaustion | ❌ No | Already increased to 524288 |
|
||||
| Random crashes | ❌ No | Pattern exact, reproducible, tied to config rewrites |
|
||||
|
||||
---
|
||||
|
||||
### 8. **Gateway Graceful Shutdown During Message Processing** (Jan 29, 2026 Evening)
|
||||
|
||||
**Problem:** Bot receives Telegram messages (typing indicator shows) but crashes before sending responses. 100% reproducible.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user