Merge c7ea713c61 into 4583f88626

2026-01-29 19:00:20 +00:00 · 2026-01-29 19:00:20 +00:00 · 10d6fb590d
commit 10d6fb590d
parent 4583f88626 c7ea713c61
17 changed files with 524 additions and 113 deletions
--- a/comments_3647.txt
+++ b/comments_3647.txt
@ -0,0 +1,47 @@
+FILE: src/agents/session-transcript-repair.ts
+LINE: 62
+BODY: The comment on line 60 is questioning rather than explanatory. Comments with questions like "Deep clone to avoid mutating the input unless necessary?" and "Actually, we can mutate in place effectively or clone messages that need change" suggest uncertainty or incomplete decision-making. The implementation does create a new array and new message objects when changes are detected (lines 63, 107-110), so the comment should clearly state the approach taken: "Creates new message objects only when sanitization is needed; otherwise returns the original messages to avoid unnecessary copying."
+```suggestion
+  // Creates new message objects only when sanitization is needed; otherwise
+  // returns the original messages to avoid unnecessary copying, while guarding
+  // against corrupt JSON in tool arguments that could break the session.
+```
+---
+FILE: src/agents/session-transcript-repair.ts
+LINE: 86
+BODY: The function validates and discards valid JSON strings but doesn't parse them. When `input` is a string containing valid JSON (line 85), the code simply pushes the block unchanged. However, if `input` should always be an object (not a string), then valid JSON strings should also be parsed and converted to objects for consistency. Clarify the expected behavior: should valid JSON strings be preserved as-is, or should they also be parsed into objects? The current behavior creates inconsistency where some inputs are objects and some are JSON strings.
+```suggestion
+            const parsedInput = JSON.parse((block as any).input);
+            nextContent.push({
+              ...block,
+              input: parsedInput,
+            });
+            msgChanged = true;
+```
+---
+FILE: src/agents/session-transcript-repair.ts
+LINE: 118
+BODY: The function doesn't return any report or metadata about what was sanitized, unlike `repairToolUseResultPairing` which returns a `ToolUseRepairReport` with details about changes made. Consider returning a report object that includes information such as the number of sanitized blocks and which tool types were affected. This would help with debugging, monitoring, and understanding the extent of data corruption in sessions. Alternatively, add inline logging to track sanitization events.
+---
+FILE: src/agents/session-transcript-repair.ts
+LINE: 118
+BODY: The `sanitizeToolUseArgs` function lacks test coverage. The existing test file only tests `sanitizeToolUseResultPairing`, but does not directly test the new `sanitizeToolUseArgs` function. Add tests that verify:
+1. Valid JSON strings in `input` fields are preserved
+2. Invalid JSON strings in `input` fields are replaced with `{}`
+3. Object values in `input` fields are preserved
+4. The `_sanitized` and `_originalInput` metadata fields are correctly set when sanitization occurs
+5. Messages without tool blocks are left unchanged
+---
+FILE: src/agents/session-transcript-repair.ts
+LINE: 97
+BODY: Silent data loss occurs when malformed JSON is replaced with an empty object. When the sanitization logic detects and replaces invalid JSON in tool arguments, it stores the original input in `_originalInput` but doesn't log this potentially important information. Consider adding a warning log similar to patterns found elsewhere in the codebase (e.g., in `compaction.ts`, `model-catalog.ts`) to aid debugging when sessions are repaired. The log should include the tool name and a sample of the malformed input to help trace the root cause of the corruption.
+---
+FILE: src/agents/session-transcript-repair.ts
+LINE: 100
+BODY: The function only sanitizes the `input` field for tool use blocks, but the existing tests use `arguments` as the field name (see lines 11-12, 38, 65 in session-transcript-repair.test.ts). This suggests there may be inconsistency in the field naming across different tool block types. Verify whether tool blocks can use both `input` and `arguments` field names, and if so, ensure the sanitization logic handles both. If they are mutually exclusive based on the block type, add a comment explaining this distinction.
+---
+FILE: src/agents/session-transcript-repair.ts
+LINE: 94
+BODY: Using `(block as any).input` bypasses TypeScript's type checking. The code would be more maintainable and safer with proper type guards or typed access. Consider defining a proper type for tool blocks or using a type guard function to check for the presence of the `input` field before accessing it. This approach would make the code more self-documenting and catch potential issues at compile time.
+---
+
--- a/comments_3648.txt
+++ b/comments_3648.txt
@ -0,0 +1,47 @@
+FILE: src/agents/system-prompt.ts
+LINE: 412
+BODY: The Windows shell guidance feature lacks test coverage. Consider adding a test case in `system-prompt.test.ts` to verify that the Windows-specific guidance is included when `runtimeInfo.os` contains "windows" and excluded otherwise. This would ensure the feature works as expected and prevent regressions.
+
+For example, you could add a test like:
+- Test that when runtimeInfo.os includes "Windows_NT", the prompt contains "## Windows Shell Guidance"
+- Test that when runtimeInfo.os is "Darwin" or "Linux", this section is not included
+---
+FILE: src/agents/system-prompt.ts
+LINE: 100
+BODY: This PR includes extensive indentation/formatting changes throughout the file that are not mentioned in the PR description. While these changes appear to be consistent reformatting (switching from multi-line template literals to array-based formatting), they make it harder to review the actual functional change (the Windows guidance addition). Consider:
+
+1. Splitting formatting changes into a separate commit/PR, OR
+2. Explicitly mentioning these formatting changes in the PR description
+
+This follows best practices for keeping functional changes separate from style/formatting changes for easier code review and git history.
+---
+FILE: src/agents/system-prompt.ts
+LINE: 407
+BODY: The Windows guidance warns against using Unix commands like `grep`, but the system prompt (lines 340-342) also lists `grep`, `find`, and `ls` as Pi's standard tools. This could be confusing because:
+
+1. Pi provides `grep`, `find`, and `ls` as built-in tools that work cross-platform
+2. The Windows guidance seems to warn against using these names in shell commands via the `exec` tool
+
+Consider clarifying this distinction in the Windows guidance. For example: "- Do NOT use Unix commands like `grep`, `sed`, `awk`, `head`, `tail` in exec/shell commands unless you are sure they are installed (Pi's built-in grep/find/ls tools work fine)."
+
+This makes it clear that the warning is specifically about shell commands, not Pi's built-in tools.
+```suggestion
+        "- Do NOT use Unix commands like `grep`, `sed`, `awk`, `head`, `tail` in exec/shell (PowerShell) commands unless you are sure they are installed. Pi's built-in tools named `grep`, `find`, and `ls` are safe to use.",
+```
+---
+FILE: .github/PULL_REQUEST_TEMPLATE.md
+LINE: 315
+BODY: The PR includes changes to `.github/PULL_REQUEST_TEMPLATE.md` which is unrelated to adding Windows PowerShell support. The diff shows the entire file (315 lines) as new additions, which suggests either:
+
+1. The file was regenerated/reformatted
+2. Line ending changes (CRLF Γåö LF)
+3. The file was accidentally included in this PR
+
+Since this is unrelated to the stated PR purpose ("add windows powershell support to system prompt"), consider:
+- Removing this file from the PR if it was accidentally included
+- If intentional, explain the reason in the PR description
+- If it's a line ending change, consider using `.gitattributes` to enforce consistent line endings
+
+Including unrelated changes makes code review more difficult and complicates the git history.
+---
+
--- a/feedback_3647.txt
+++ b/feedback_3647.txt
@ -0,0 +1,20 @@
+--- PR 3647 --
+REVIEWER: copilot-pull-request-reviewer
+BODY: ## Pull request overview
+
+This PR implements sanitization for tool arguments in session history to prevent crashes caused by malformed JSON in tool input fields. When invalid JSON is detected in tool use blocks, it is replaced with an empty object `{}` to prevent `InternalError.Algo.InvalidParameter` errors that would otherwise crash the entire session.
+
+**Changes:**
+- Added `sanitizeToolUseArgs` function to detect and repair malformed JSON in tool input arguments
+- Integrated sanitization into the `sanitizeToolUseResultPairing` pipeline to ensure it runs before tool result pairing repair
+- Preserved original malformed input in `_originalInput` metadata field for debugging
+
+
+
+
+
+---
+
+≡ƒÆí <a href="/moltbot/moltbot/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
+
+
--- a/feedback_3648.txt
+++ b/feedback_3648.txt
@ -0,0 +1,29 @@
+--- PR 3648 --
+REVIEWER: copilot-pull-request-reviewer
+BODY: ## Pull request overview
+
+This PR adds Windows-specific shell guidance to the agent's system prompt to help it use PowerShell syntax and commands when running on Windows. The implementation detects Windows via the runtime OS information and injects appropriate guidance about PowerShell syntax, environment variables, and command alternatives.
+
+**Changes:**
+- Added conditional Windows Shell Guidance section to system prompt when running on Windows
+- Reformatted multiple sections of system-prompt.ts from template literals to array-based formatting
+- Modified .github/PULL_REQUEST_TEMPLATE.md (appears unrelated to main feature)
+
+### Reviewed changes
+
+Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
+
+| File | Description |
+| ---- | ----------- |
+| src/agents/system-prompt.ts | Added Windows PowerShell guidance section (lines 402-412) with conditional logic based on OS detection; includes extensive indentation/formatting changes throughout the file |
+| .github/PULL_REQUEST_TEMPLATE.md | Entire file appears as new additions (315 lines), likely due to formatting or line ending changes; unrelated to Windows PowerShell feature |
+
+
+
+
+
+---
+
+≡ƒÆí <a href="/moltbot/moltbot/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
+
+
--- a/pr3647.json
+++ b/pr3647.json
--- a/pr3647_comments.json
+++ b/pr3647_comments.json
--- a/pr3648.json
+++ b/pr3648.json
--- a/pr3648_comments.json
+++ b/pr3648_comments.json
--- a/pr3648_latest_comments.json
+++ b/pr3648_latest_comments.json
--- a/src/agents/pi-embedded-helpers.isfailovererrormessage.test.ts
+++ b/src/agents/pi-embedded-helpers.isfailovererrormessage.test.ts
@ -14,6 +14,8 @@ describe("isFailoverErrorMessage", () => {
    const samples = [
      "invalid api key",
      "429 rate limit exceeded",
+      "No available auth profile for openai-codex (all in cooldown or unavailable).",
+      "Provider openai-codex is in cooldown (all profiles unavailable)",
      "Your credit balance is too low",
      "request timed out",
      "invalid request format",
--- a/src/agents/pi-embedded-helpers/errors.ts
+++ b/src/agents/pi-embedded-helpers/errors.ts
@ -362,6 +362,10 @@ const ERROR_PATTERNS = {
    "quota exceeded",
    "resource_exhausted",
    "usage limit",
+    // Auth-profile selection can fail due to provider-wide cooldown after repeated rate limits.
+    // Treat as rate_limit so model fallback can engage even before a request is sent.
+    /all in cooldown/i,
+    /in cooldown \(all profiles unavailable\)/i,
  ],
  overloaded: [/overloaded_error|"type"\s*:\s*"overloaded_error"/i, "overloaded"],
  timeout: ["timeout", "timed out", "deadline exceeded", "context deadline exceeded"],
@ -388,6 +392,8 @@ const ERROR_PATTERNS = {
    /\b403\b/,
    "no credentials found",
    "no api key found",
+    // Agent can fail before request if no usable auth profile exists.
+    /no available auth profile/i,
  ],
  format: [
    "string should match pattern",
--- a/src/agents/pi-embedded-runner.test.ts
+++ b/src/agents/pi-embedded-runner.test.ts
@ -101,7 +101,7 @@ beforeAll(async () => {
  workspaceDir = path.join(tempRoot, "workspace");
  await fs.mkdir(agentDir, { recursive: true });
  await fs.mkdir(workspaceDir, { recursive: true });
-}, 20_000);
+}, 120_000);

 afterAll(async () => {
  if (!tempRoot) return;
--- a/src/agents/session-transcript-repair.test.ts
+++ b/src/agents/session-transcript-repair.test.ts
@ -1,26 +1,33 @@
 import type { AgentMessage } from "@mariozechner/pi-agent-core";
 import { describe, expect, it } from "vitest";
-import { sanitizeToolUseResultPairing } from "./session-transcript-repair.js";
+import { sanitizeToolUseArgs, sanitizeToolUseResultPairing } from "./session-transcript-repair.js";
+
+const now = Date.now();

 describe("sanitizeToolUseResultPairing", () => {
  it("moves tool results directly after tool calls and inserts missing results", () => {
-    const input = [
+    const input: AgentMessage[] = [
      {
        role: "assistant",
        content: [
          { type: "toolCall", id: "call_1", name: "read", arguments: {} },
          { type: "toolCall", id: "call_2", name: "exec", arguments: {} },
        ],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
      },
-      { role: "user", content: "user message that should come after tool use" },
+      { role: "user", content: "user message that should come after tool use", timestamp: now },
      {
        role: "toolResult",
        toolCallId: "call_2",
        toolName: "exec",
        content: [{ type: "text", text: "ok" }],
        isError: false,
+        timestamp: now,
      },
-    ] satisfies AgentMessage[];
+    ];

    const out = sanitizeToolUseResultPairing(input);
    expect(out[0]?.role).toBe("assistant");
@ -32,10 +39,14 @@ describe("sanitizeToolUseResultPairing", () => {
  });

  it("drops duplicate tool results for the same id within a span", () => {
-    const input = [
+    const input: AgentMessage[] = [
      {
        role: "assistant",
        content: [{ type: "toolCall", id: "call_1", name: "read", arguments: {} }],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
      },
      {
        role: "toolResult",
@ -43,6 +54,7 @@ describe("sanitizeToolUseResultPairing", () => {
        toolName: "read",
        content: [{ type: "text", text: "first" }],
        isError: false,
+        timestamp: now,
      },
      {
        role: "toolResult",
@ -50,19 +62,24 @@ describe("sanitizeToolUseResultPairing", () => {
        toolName: "read",
        content: [{ type: "text", text: "second" }],
        isError: false,
+        timestamp: now,
      },
-      { role: "user", content: "ok" },
-    ] satisfies AgentMessage[];
+      { role: "user", content: "ok", timestamp: now },
+    ];

    const out = sanitizeToolUseResultPairing(input);
    expect(out.filter((m) => m.role === "toolResult")).toHaveLength(1);
  });

  it("drops duplicate tool results for the same id across the transcript", () => {
-    const input = [
+    const input: AgentMessage[] = [
      {
        role: "assistant",
        content: [{ type: "toolCall", id: "call_1", name: "read", arguments: {} }],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
      },
      {
        role: "toolResult",
@ -70,16 +87,25 @@ describe("sanitizeToolUseResultPairing", () => {
        toolName: "read",
        content: [{ type: "text", text: "first" }],
        isError: false,
+        timestamp: now,
+      },
+      {
+        role: "assistant",
+        content: [{ type: "text", text: "ok" }],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
      },
-      { role: "assistant", content: [{ type: "text", text: "ok" }] },
      {
        role: "toolResult",
        toolCallId: "call_1",
        toolName: "read",
        content: [{ type: "text", text: "second (duplicate)" }],
        isError: false,
+        timestamp: now,
      },
-    ] satisfies AgentMessage[];
+    ];

    const out = sanitizeToolUseResultPairing(input);
    const results = out.filter((m) => m.role === "toolResult") as Array<{
@ -90,23 +116,131 @@ describe("sanitizeToolUseResultPairing", () => {
  });

  it("drops orphan tool results that do not match any tool call", () => {
-    const input = [
-      { role: "user", content: "hello" },
+    const input: AgentMessage[] = [
+      { role: "user", content: "hello", timestamp: now },
      {
        role: "toolResult",
        toolCallId: "call_orphan",
        toolName: "read",
        content: [{ type: "text", text: "orphan" }],
        isError: false,
+        timestamp: now,
      },
      {
        role: "assistant",
        content: [{ type: "text", text: "ok" }],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
      },
-    ] satisfies AgentMessage[];
+    ];

    const out = sanitizeToolUseResultPairing(input);
    expect(out.some((m) => m.role === "toolResult")).toBe(false);
    expect(out.map((m) => m.role)).toEqual(["user", "assistant"]);
  });
 });
+
+describe("sanitizeToolUseArgs", () => {
+  it("preserves valid JSON strings in input fields", () => {
+    const input: AgentMessage[] = [
+      {
+        role: "assistant",
+        content: [
+          { type: "toolCall", id: "call_1", name: "read", input: '{"path":"foo.txt"}' } as any,
+        ],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
+      },
+    ];
+
+    const result = sanitizeToolUseArgs(input);
+    expect(result.changed).toBe(true);
+    const tool = (result.messages[0] as any).content[0];
+    expect(tool.input).toEqual({ path: "foo.txt" });
+    expect(result.sanitizedCount).toBe(0);
+  });
+
+  it("replaces invalid JSON strings with {} and sets metadata", () => {
+    const input: AgentMessage[] = [
+      {
+        role: "assistant",
+        content: [{ type: "toolCall", id: "call_1", name: "read", input: "{ invalid json" } as any],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
+      },
+    ];
+
+    const result = sanitizeToolUseArgs(input);
+    expect(result.changed).toBe(true);
+    expect(result.sanitizedCount).toBe(1);
+    const tool = (result.messages[0] as any).content[0];
+    expect(tool.input).toEqual({});
+    expect(tool._sanitized).toBe(true);
+    expect(tool._originalInput).toBe("{ invalid json");
+  });
+
+  it("preserves already-parsed object values in input fields", () => {
+    const input: AgentMessage[] = [
+      {
+        role: "assistant",
+        content: [
+          { type: "toolCall", id: "call_1", name: "read", input: { path: "bar.txt" } } as any,
+        ],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
+      },
+    ];
+
+    const result = sanitizeToolUseArgs(input);
+    expect(result.changed).toBe(false);
+    expect(result.messages).toBe(input);
+    const tool = (result.messages[0] as any).content[0];
+    expect(tool.input).toEqual({ path: "bar.txt" });
+  });
+
+  it("handles the 'arguments' alias used by some providers", () => {
+    const input: AgentMessage[] = [
+      {
+        role: "assistant",
+        content: [
+          { type: "toolCall", id: "call_1", name: "read", arguments: '{"path":"baz.txt"}' } as any,
+        ],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
+      },
+    ];
+
+    const result = sanitizeToolUseArgs(input);
+    expect(result.changed).toBe(true);
+    const tool = (result.messages[0] as any).content[0];
+    expect(tool.arguments).toEqual({ path: "baz.txt" });
+  });
+
+  it("leaves messages without tool blocks unchanged", () => {
+    const input: AgentMessage[] = [
+      { role: "user", content: "hello", timestamp: now },
+      {
+        role: "assistant",
+        content: [{ type: "text", text: "hi" }],
+        timestamp: now,
+        api: "openai",
+        provider: "openai",
+        model: "gpt-4",
+      },
+    ];
+
+    const result = sanitizeToolUseArgs(input);
+    expect(result.changed).toBe(false);
+    expect(result.messages).toBe(input);
+  });
+});
--- a/src/agents/session-transcript-repair.ts
+++ b/src/agents/session-transcript-repair.ts
@ -56,8 +56,99 @@ function makeMissingToolResult(params: {

 export { makeMissingToolResult };

+export type ToolUseSanitizationReport = {
+  messages: AgentMessage[];
+  sanitizedCount: number;
+  changed: boolean;
+};
+
+export function sanitizeToolUseArgs(messages: AgentMessage[]): ToolUseSanitizationReport {
+  // Creates new message objects only when sanitization is needed; otherwise
+  // returns the original messages to avoid unnecessary copying, while guarding
+  // against corrupt JSON in tool arguments that could break the session.
+  const out: AgentMessage[] = [];
+  let changed = false;
+  let sanitizedCount = 0;
+
+  for (const msg of messages) {
+    if (msg.role !== "assistant" || !Array.isArray(msg.content)) {
+      out.push(msg);
+      continue;
+    }
+
+    const content = msg.content;
+    const nextContent: any[] = [];
+    let msgChanged = false;
+
+    for (const block of content) {
+      const anyBlock = block as any;
+      if (
+        anyBlock &&
+        typeof anyBlock === "object" &&
+        (anyBlock.type === "toolUse" ||
+          anyBlock.type === "toolCall" ||
+          anyBlock.type === "functionCall")
+      ) {
+        const toolBlock = block as any;
+        // Handle both 'input' and 'arguments' fields (some providers use arguments)
+        const inputField =
+          "input" in toolBlock ? "input" : "arguments" in toolBlock ? "arguments" : null;
+
+        if (inputField && typeof toolBlock[inputField] === "string") {
+          try {
+            // Consistency: Always parse valid JSON strings into objects
+            const parsed = JSON.parse(toolBlock[inputField]);
+            nextContent.push({
+              ...toolBlock,
+              [inputField]: parsed,
+            });
+            msgChanged = true;
+          } catch {
+            // Invalid JSON found in tool args.
+            // Replace with empty object to prevent downstream crashes.
+            sanitizedCount += 1;
+            const original = String(toolBlock[inputField]);
+            const sample = original.length > 100 ? `${original.slice(0, 100)}...` : original;
+            console.warn(
+              `[SessionRepair] Sanitized malformed JSON in tool use '${toolBlock.name || "unknown"}'. Original: ${sample}`,
+            );
+            nextContent.push({
+              ...toolBlock,
+              [inputField]: {},
+              _sanitized: true,
+              _originalInput: toolBlock[inputField],
+            });
+            msgChanged = true;
+          }
+        } else {
+          nextContent.push(block);
+        }
+      } else {
+        nextContent.push(block);
+      }
+    }
+
+    if (msgChanged) {
+      out.push({
+        ...msg,
+        content: nextContent,
+      } as AgentMessage);
+      changed = true;
+    } else {
+      out.push(msg);
+    }
+  }
+
+  return {
+    messages: changed ? out : messages,
+    sanitizedCount,
+    changed,
+  };
+}
+
 export function sanitizeToolUseResultPairing(messages: AgentMessage[]): AgentMessage[] {
-  return repairToolUseResultPairing(messages).messages;
+  const sanitized = sanitizeToolUseArgs(messages);
+  return repairToolUseResultPairing(sanitized.messages).messages;
 }

 export type ToolUseRepairReport = {
--- a/src/agents/system-prompt.test.ts
+++ b/src/agents/system-prompt.test.ts
@ -369,4 +369,32 @@ describe("buildAgentSystemPrompt", () => {
    expect(prompt).toContain("## Reactions");
    expect(prompt).toContain("Reactions are enabled for Telegram in MINIMAL mode.");
  });
+
+  it("includes Windows Shell Guidance when running on Windows", () => {
+    const prompt = buildAgentSystemPrompt({
+      workspaceDir: "/tmp/clawd",
+      runtimeInfo: {
+        os: "Windows_NT",
+        host: "host",
+        agentId: "agent",
+      },
+    });
+
+    expect(prompt).toContain("## Windows Shell Guidance");
+    expect(prompt).toContain("You are running on Windows (PowerShell).");
+  });
+
+  it("omits Windows Shell Guidance when running on non-Windows", () => {
+    const prompt = buildAgentSystemPrompt({
+      workspaceDir: "/tmp/clawd",
+      runtimeInfo: {
+        os: "Darwin",
+        host: "host",
+        agentId: "agent",
+      },
+    });
+
+    expect(prompt).not.toContain("## Windows Shell Guidance");
+    expect(prompt).not.toContain("You are running on Windows (PowerShell).");
+  });
 });
--- a/src/agents/system-prompt.ts
+++ b/src/agents/system-prompt.ts
@ -83,21 +83,21 @@ function buildMessagingSection(params: {
    "- Never use exec/curl for provider messaging; Moltbot handles all routing internally.",
    params.availableTools.has("message")
      ? [
-          "",
-          "### message tool",
-          "- Use `message` for proactive sends + channel actions (polls, reactions, etc.).",
-          "- For `action=send`, include `to` and `message`.",
-          `- If multiple channels are configured, pass \`channel\` (${params.messageChannelOptions}).`,
-          `- If you use \`message\` (\`action=send\`) to deliver your user-visible reply, respond with ONLY: ${SILENT_REPLY_TOKEN} (avoid duplicate replies).`,
-          params.inlineButtonsEnabled
-            ? "- Inline buttons supported. Use `action=send` with `buttons=[[{text,callback_data}]]` (callback_data routes back as a user message)."
-            : params.runtimeChannel
-              ? `- Inline buttons not enabled for ${params.runtimeChannel}. If you need them, ask to set ${params.runtimeChannel}.capabilities.inlineButtons ("dm"|"group"|"all"|"allowlist").`
-              : "",
-          ...(params.messageToolHints ?? []),
-        ]
-          .filter(Boolean)
-          .join("\n")
+        "",
+        "### message tool",
+        "- Use `message` for proactive sends + channel actions (polls, reactions, etc.).",
+        "- For `action=send`, include `to` and `message`.",
+        `- If multiple channels are configured, pass \`channel\` (${params.messageChannelOptions}).`,
+        `- If you use \`message\` (\`action=send\`) to deliver your user-visible reply, respond with ONLY: ${SILENT_REPLY_TOKEN} (avoid duplicate replies).`,
+        params.inlineButtonsEnabled
+          ? "- Inline buttons supported. Use `action=send` with `buttons=[[{text,callback_data}]]` (callback_data routes back as a user message)."
+          : params.runtimeChannel
+            ? `- Inline buttons not enabled for ${params.runtimeChannel}. If you need them, ask to set ${params.runtimeChannel}.capabilities.inlineButtons ("dm"|"group"|"all"|"allowlist").`
+            : "",
+        ...(params.messageToolHints ?? []),
+      ]
+        .filter(Boolean)
+        .join("\n")
      : "",
    "",
  ];
@ -282,15 +282,15 @@ export function buildAgentSystemPrompt(params: {
      : undefined;
  const reasoningHint = params.reasoningTagHint
    ? [
-        "ALL internal reasoning MUST be inside <think>...</think>.",
-        "Do not output any analysis outside <think>.",
-        "Format every reply as <think>...</think> then <final>...</final>, with no other text.",
-        "Only the final user-visible reply may appear inside <final>.",
-        "Only text inside <final> is shown to the user; everything else is discarded and never seen by the user.",
-        "Example:",
-        "<think>Short internal reasoning.</think>",
-        "<final>Hey there! What would you like to do next?</final>",
-      ].join(" ")
+      "ALL internal reasoning MUST be inside <think>...</think>.",
+      "Do not output any analysis outside <think>.",
+      "Format every reply as <think>...</think> then <final>...</final>, with no other text.",
+      "Only the final user-visible reply may appear inside <final>.",
+      "Only text inside <final> is shown to the user; everything else is discarded and never seen by the user.",
+      "Example:",
+      "<think>Short internal reasoning.</think>",
+      "<final>Hey there! What would you like to do next?</final>",
+    ].join(" ")
    : undefined;
  const reasoningLevel = params.reasoningLevel ?? "off";
  const userTimezone = params.userTimezone?.trim();
@ -336,21 +336,21 @@ export function buildAgentSystemPrompt(params: {
    toolLines.length > 0
      ? toolLines.join("\n")
      : [
-          "Pi lists the standard tools above. This runtime enables:",
-          "- grep: search file contents for patterns",
-          "- find: find files by glob pattern",
-          "- ls: list directory contents",
-          "- apply_patch: apply multi-file patches",
-          `- ${execToolName}: run shell commands (supports background via yieldMs/background)`,
-          `- ${processToolName}: manage background exec sessions`,
-          "- browser: control clawd's dedicated browser",
-          "- canvas: present/eval/snapshot the Canvas",
-          "- nodes: list/describe/notify/camera/screen on paired nodes",
-          "- cron: manage cron jobs and wake events (use for reminders; when scheduling a reminder, write the systemEvent text as something that will read like a reminder when it fires, and mention that it is a reminder depending on the time gap between setting and firing; include recent context in reminder text if appropriate)",
-          "- sessions_list: list sessions",
-          "- sessions_history: fetch session history",
-          "- sessions_send: send to another session",
-        ].join("\n"),
+        "Pi lists the standard tools above. This runtime enables:",
+        "- grep: search file contents for patterns",
+        "- find: find files by glob pattern",
+        "- ls: list directory contents",
+        "- apply_patch: apply multi-file patches",
+        `- ${execToolName}: run shell commands (supports background via yieldMs/background)`,
+        `- ${processToolName}: manage background exec sessions`,
+        "- browser: control clawd's dedicated browser",
+        "- canvas: present/eval/snapshot the Canvas",
+        "- nodes: list/describe/notify/camera/screen on paired nodes",
+        "- cron: manage cron jobs and wake events (use for reminders; when scheduling a reminder, write the systemEvent text as something that will read like a reminder when it fires, and mention that it is a reminder depending on the time gap between setting and firing; include recent context in reminder text if appropriate)",
+        "- sessions_list: list sessions",
+        "- sessions_history: fetch session history",
+        "- sessions_send: send to another session",
+      ].join("\n"),
    "TOOLS.md does not control tool availability; it is user guidance for how to use external tools.",
    "If a task is more complex or takes longer, spawn a sub-agent. It will do the work for you and ping you when it's done. You can always check up on it.",
    "",
@ -375,11 +375,11 @@ export function buildAgentSystemPrompt(params: {
    hasGateway && !isMinimal ? "## Moltbot Self-Update" : "",
    hasGateway && !isMinimal
      ? [
-          "Get Updates (self-update) is ONLY allowed when the user explicitly asks for it.",
-          "Do not run config.apply or update.run unless the user explicitly requests an update or config change; if it's not explicit, ask first.",
-          "Actions: config.get, config.schema, config.apply (validate + write full config, then restart), update.run (update deps or git, then restart).",
-          "After restart, Moltbot pings the last active session automatically.",
-        ].join("\n")
+        "Get Updates (self-update) is ONLY allowed when the user explicitly asks for it.",
+        "Do not run config.apply or update.run unless the user explicitly requests an update or config change; if it's not explicit, ask first.",
+        "Actions: config.get, config.schema, config.apply (validate + write full config, then restart), update.run (update deps or git, then restart).",
+        "After restart, Moltbot pings the last active session automatically.",
+      ].join("\n")
      : "",
    hasGateway && !isMinimal ? "" : "",
    "",
@ -396,50 +396,57 @@ export function buildAgentSystemPrompt(params: {
    params.modelAliasLines && params.modelAliasLines.length > 0 && !isMinimal ? "" : "",
    "## Workspace",
    `Your working directory is: ${params.workspaceDir}`,
-    "Treat this directory as the single global workspace for file operations unless explicitly instructed otherwise.",
-    ...workspaceNotes,
-    "",
+    ...(runtimeInfo?.os?.toLowerCase()?.includes("windows")
+      ? [
+        "## Windows Shell Guidance",
+        "You are running on Windows (PowerShell).",
+        "- Use PowerShell syntax (e.g. `$env:VAR` instead of `%VAR%` or `$VAR`).",
+        "- Do NOT use Unix commands like `grep`, `sed`, `awk`, `head`, `tail` in exec/shell (PowerShell) commands unless you are sure they are installed. Pi's built-in tools named `grep`, `find`, and `ls` are safe to use.",
+        "- Use `findstr` or `Select-String` instead of `grep`.",
+        "- Use `Get-ChildItem` (dir/ls) with `-Recurse` instead of `find`.",
+        "",
+      ]
+      : []),
    ...docsSection,
    params.sandboxInfo?.enabled ? "## Sandbox" : "",
    params.sandboxInfo?.enabled
      ? [
-          "You are running in a sandboxed runtime (tools execute in Docker).",
-          "Some tools may be unavailable due to sandbox policy.",
-          "Sub-agents stay sandboxed (no elevated/host access). Need outside-sandbox read/write? Don't spawn; ask first.",
-          params.sandboxInfo.workspaceDir
-            ? `Sandbox workspace: ${params.sandboxInfo.workspaceDir}`
+        "You are running in a sandboxed runtime (tools execute in Docker).",
+        "Some tools may be unavailable due to sandbox policy.",
+        "Sub-agents stay sandboxed (no elevated/host access). Need outside-sandbox read/write? Don't spawn; ask first.",
+        params.sandboxInfo.workspaceDir
+          ? `Sandbox workspace: ${params.sandboxInfo.workspaceDir}`
+          : "",
+        params.sandboxInfo.workspaceAccess
+          ? `Agent workspace access: ${params.sandboxInfo.workspaceAccess}${params.sandboxInfo.agentWorkspaceMount
+            ? ` (mounted at ${params.sandboxInfo.agentWorkspaceMount})`
+            : ""
+          }`
+          : "",
+        params.sandboxInfo.browserBridgeUrl ? "Sandbox browser: enabled." : "",
+        params.sandboxInfo.browserNoVncUrl
+          ? `Sandbox browser observer (noVNC): ${params.sandboxInfo.browserNoVncUrl}`
+          : "",
+        params.sandboxInfo.hostBrowserAllowed === true
+          ? "Host browser control: allowed."
+          : params.sandboxInfo.hostBrowserAllowed === false
+            ? "Host browser control: blocked."
            : "",
-          params.sandboxInfo.workspaceAccess
-            ? `Agent workspace access: ${params.sandboxInfo.workspaceAccess}${
-                params.sandboxInfo.agentWorkspaceMount
-                  ? ` (mounted at ${params.sandboxInfo.agentWorkspaceMount})`
-                  : ""
-              }`
-            : "",
-          params.sandboxInfo.browserBridgeUrl ? "Sandbox browser: enabled." : "",
-          params.sandboxInfo.browserNoVncUrl
-            ? `Sandbox browser observer (noVNC): ${params.sandboxInfo.browserNoVncUrl}`
-            : "",
-          params.sandboxInfo.hostBrowserAllowed === true
-            ? "Host browser control: allowed."
-            : params.sandboxInfo.hostBrowserAllowed === false
-              ? "Host browser control: blocked."
-              : "",
-          params.sandboxInfo.elevated?.allowed
-            ? "Elevated exec is available for this session."
-            : "",
-          params.sandboxInfo.elevated?.allowed
-            ? "User can toggle with /elevated on|off|ask|full."
-            : "",
-          params.sandboxInfo.elevated?.allowed
-            ? "You may also send /elevated on|off|ask|full when needed."
-            : "",
-          params.sandboxInfo.elevated?.allowed
-            ? `Current elevated level: ${params.sandboxInfo.elevated.defaultLevel} (ask runs exec on host with approvals; full auto-approves).`
-            : "",
-        ]
-          .filter(Boolean)
-          .join("\n")
+        params.sandboxInfo.elevated?.allowed
+          ? "Elevated exec is available for this session."
+          : "",
+        params.sandboxInfo.elevated?.allowed
+          ? "User can toggle with /elevated on|off|ask|full."
+          : "",
+        params.sandboxInfo.elevated?.allowed
+          ? "You may also send /elevated on|off|ask|full when needed."
+          : "",
+        params.sandboxInfo.elevated?.allowed
+          ? `Current elevated level: ${params.sandboxInfo.elevated.defaultLevel} (ask runs exec on host with approvals; full auto-approves).`
+          : "",
+      ]
+        .filter(Boolean)
+        .join("\n")
      : "",
    params.sandboxInfo?.enabled ? "" : "",
    ...buildUserIdentitySection(ownerLine, isMinimal),
@ -472,22 +479,22 @@ export function buildAgentSystemPrompt(params: {
    const guidanceText =
      level === "minimal"
        ? [
-            `Reactions are enabled for ${channel} in MINIMAL mode.`,
-            "React ONLY when truly relevant:",
-            "- Acknowledge important user requests or confirmations",
-            "- Express genuine sentiment (humor, appreciation) sparingly",
-            "- Avoid reacting to routine messages or your own replies",
-            "Guideline: at most 1 reaction per 5-10 exchanges.",
-          ].join("\n")
+          `Reactions are enabled for ${channel} in MINIMAL mode.`,
+          "React ONLY when truly relevant:",
+          "- Acknowledge important user requests or confirmations",
+          "- Express genuine sentiment (humor, appreciation) sparingly",
+          "- Avoid reacting to routine messages or your own replies",
+          "Guideline: at most 1 reaction per 5-10 exchanges.",
+        ].join("\n")
        : [
-            `Reactions are enabled for ${channel} in EXTENSIVE mode.`,
-            "Feel free to react liberally:",
-            "- Acknowledge messages with appropriate emojis",
-            "- Express sentiment and personality through reactions",
-            "- React to interesting content, humor, or notable events",
-            "- Use reactions to confirm understanding or agreement",
-            "Guideline: react whenever it feels natural.",
-          ].join("\n");
+          `Reactions are enabled for ${channel} in EXTENSIVE mode.`,
+          "Feel free to react liberally:",
+          "- Acknowledge messages with appropriate emojis",
+          "- Express sentiment and personality through reactions",
+          "- React to interesting content, humor, or notable events",
+          "- Use reactions to confirm understanding or agreement",
+          "Guideline: react whenever it feels natural.",
+        ].join("\n");
    lines.push("## Reactions", guidanceText, "");
  }
  if (reasoningHint) {
--- a/system-prompt.original.ts
+++ b/system-prompt.original.ts