openclaw/docs/providers/cerebras.md
2026-01-26 23:57:04 +01:00

1.3 KiB

summary read_when
Use Cerebras ultra-fast inference for LLaMA, Qwen, GLM models via OpenAI-compatible API
You want to use Cerebras inference
You need ultra-fast model responses

Cerebras

Cerebras provides ultra-fast inference using their custom AI accelerator chips, delivering industry-leading speed for popular open-source models through an OpenAI-compatible API.

CLI setup

clawdbot onboard --auth-choice cerebras-api-key
# or non-interactive
clawdbot onboard --cerebras-api-key "$CEREBRAS_API_KEY"

Config snippet

{
  env: { CEREBRAS_API_KEY: "csk-..." },
  agents: {
    defaults: {
      model: { primary: "cerebras/llama3.1-8b" }
    }
  }
}

Available models

All models run at FP16 or FP16/FP8 precision:

  • cerebras/llama3.1-8b - LLaMA 3.1 8B (FP16)
  • cerebras/llama-3.3-70b - LLaMA 3.3 70B (FP16)
  • cerebras/gpt-oss-120b - GPT OSS 120B (FP16/FP8)
  • cerebras/qwen-3-32b - Qwen 3 32B (FP16)
  • cerebras/qwen-3-235b-a22b-instruct-2507 - Qwen 3 235B (FP16/FP8)
  • cerebras/zai-glm-4.7 - GLM 4.7 (FP16/FP8)

Notes