openclaw/docs/providers/litellm.md
Charles-Henri ROBICHE c50d5c7f34
fix(litellm): enable prompt caching for Anthropic models
- Add LiteLLM + Claude model detection to isCacheTtlEligibleProvider
- Reduces cost by 90% for Claude models through LiteLLM proxy
- Add test coverage for cache eligibility detection
- Document prompt caching behavior and cost savings

Before: $0.47 per message (no caching)
After: $0.05 per message (90% cached)

Closes #2683
2026-01-28 23:44:44 +01:00

3.4 KiB
Raw Blame History

summary read_when
Use LiteLLM as an OpenAI-compatible proxy in Clawdbot
You want to use LiteLLM as a model provider
You need to connect to a self-hosted LiteLLM proxy
You want to use any model through an OpenAI-compatible API

LiteLLM

LiteLLM is an OpenAI-compatible proxy that supports 100+ LLM APIs. Clawdbot registers it as the litellm provider and uses the OpenAI Completions API.

Quick setup

  1. Set up your LiteLLM proxy (see LiteLLM docs)
  2. Set environment variables (optional):
    • LITELLM_API_KEY - your LiteLLM API key
    • LITELLM_BASE_URL - your LiteLLM endpoint (default: http://localhost:4000)
    • LITELLM_MODEL - default model name (default: gpt-4)
  3. Run onboarding:
clawdbot onboard --auth-choice litellm-api-key

The wizard will prompt for:

  • Base URL (your LiteLLM proxy endpoint)
  • API key
  • Model name (as configured in your LiteLLM proxy)

Config example

{
  env: { LITELLM_API_KEY: "sk-..." },
  agents: {
    defaults: {
      model: { primary: "litellm/gpt-4" },
      models: { "litellm/gpt-4": { alias: "GPT-4" } }
    }
  },
  models: {
    mode: "merge",
    providers: {
      litellm: {
        baseUrl: "http://localhost:4000",
        apiKey: "${LITELLM_API_KEY}",
        api: "openai-completions",
        models: [
          {
            id: "gpt-4",
            name: "GPT-4",
            reasoning: false,
            input: ["text"],
            contextWindow: 128000,
            maxTokens: 8192
          }
        ]
      }
    }
  }
}

Multiple models

Add additional models to your config as needed:

{
  models: {
    providers: {
      litellm: {
        baseUrl: "http://localhost:4000",
        apiKey: "${LITELLM_API_KEY}",
        api: "openai-completions",
        models: [
          { id: "gpt-4", name: "GPT-4", contextWindow: 128000, maxTokens: 8192 },
          { id: "claude-3-opus", name: "Claude Opus", contextWindow: 200000, maxTokens: 4096 },
          { id: "gemini-pro", name: "Gemini Pro", contextWindow: 32000, maxTokens: 8192 }
        ]
      }
    }
  }
}

Then switch models using:

clawdbot config set agents.defaults.model.primary litellm/claude-3-opus

Prompt caching

When using Anthropic models through LiteLLM (e.g., claude-opus-4-5, claude-sonnet-4-5), Moltbot automatically enables prompt caching to reduce costs:

{
  agents: {
    defaults: {
      models: {
        "litellm/claude-opus-4-5": {
          params: {
            cacheControlTtl: "1h"  // Auto-configured for Claude models
          }
        }
      }
    }
  }
}

Cost savings with caching

  • Without caching: Every message pays full price for the entire conversation history
  • With caching (enabled by default): Repeated context costs 10x less

Example from actual usage:

  • Without caching: 93k tokens × $0.000005 = $0.47 per message
  • With caching: 123k tokens (mostly cached) = $0.05 per message (90% savings!)

Caching is automatically enabled for all claude-* models through LiteLLM.

Notes

  • Model refs use litellm/<modelId> where modelId matches your LiteLLM config.
  • The base URL should not include /v1 - Moltbot's OpenAI client appends it.
  • Supported LiteLLM models depend on your proxy configuration.
  • Prompt caching works automatically when using Claude models through LiteLLM.
  • See Model providers for provider rules.