openclaw/docs/proposals/webchat-file-upload.md
2026-01-30 10:36:51 +08:00

4.3 KiB
Raw Blame History

Gateway Web Chat: Image/File Upload

Status: draft Owner: @assistant Scope: Gateway web console (Chat)

Problem

The Gateway web Chat has no way to send images/files. Users often need to share screenshots/logs during troubleshooting.

Goals

  • Allow users to attach files via: attach button, drag & drop, paste-from-clipboard (images).
  • Show selected files as removable chips with upload progress.
  • On send, upload files to server and insert a message with media references so assistants receive MEDIA:/path.
  • Support multiple files per message.
  • Reasonable defaults for limits (2050MB per item) and allow config override.

Non-goals

  • Inline buttons/interactive UI in messages (not supported on webchat).
  • Server-side virus scanning.

UX

  • Add "Attach" icon next to the composer.
  • Dragging files over the composer shows a drop overlay.
  • Pasting an image inserts it as a pending attachment.
  • Each selected file shows: name/size, thumbnail for images, remove (×), progress bar while uploading.
  • Pressing Enter (without Shift) sends text + queued attachments.

API

  • POST /api/media/upload (multipart/form-data)
    • fields: file (repeatable), caption (optional, per-file or per-message caption TBD)
    • returns 200 JSON: { files: [ { path: "MEDIA:/path", mime: "image/png", name: "...", size: 12345, width, height } ] }
    • saves files under gateway media path (e.g., ~/.clawdbot/media/inbound/.)
  • POST /api/sessions/:sessionId/messages
    • body: { text?: string, media?: [ { path: "MEDIA:/...", caption?: string } ] }
    • server stores transcript and delivers to assistant runtime.

Notes:

  • If a single endpoint already exists to send messages, extend it to accept media[]. Otherwise, use two-phase (upload -> send).
  • Respect existing auth/session; CSRF as per current gateway conventions.

Backend

  • Add multipart handling (e.g., busboy/multer/fastify-multipart depending on stack).
  • Validate content type and size; enforce configurable limits.
  • Compute a safe filename; store to media dir; produce absolute server path and MEDIA:/ prefix for assistant.
  • For images: attempt to read dimensions (optional) to improve thumbnails.
  • Return JSON list of stored files.
  • Extend message creation to accept media[]. Persist media metadata alongside messages for rendering.

Frontend

  • Composer component changes:
    • Add hidden bound to Attach button.
    • Drag & drop overlay and handlers (dragenter/dragover/drop) on composer area.
    • Clipboard paste handler: capture image blobs and add to attachment queue.
    • Attachment queue state: { id, file|blob, name, size, previewURL, status: queued|uploading|done|error, serverPath? }[]
    • On send: for any queued files not uploaded, call /api/media/upload, show per-file progress. Then call send-message with text + media paths.
    • After success: clear input and queue.
  • Message renderer: if message.media exists, render thumbnails for images and file tiles for others; clicking downloads the file.

Config

  • gateway.webchat.uploads.maxItemMB (default 25)
  • gateway.webchat.uploads.maxFilesPerMessage (default 5)
  • gateway.webchat.uploads.accept (default images+generic: "image/*,application/pdf,.zip,.txt")

Security

  • Store under per-instance media directory; do not serve arbitrary FS paths.
  • Serve media via controlled route with auth (or signed URLs) to avoid leaking private files.
  • Sanitize filenames; block executables by default if desired.

Acceptance Criteria

  • Drag a screenshot into composer -> shows chip -> send -> message appears with thumbnail; assistant receives MEDIA:/path.
  • Click attach to select multiple files -> progress -> all appear in transcript.
  • Paste an image from clipboard -> becomes attachment and can be sent.
  • Oversize file is rejected with a clear error.

Test Plan

  • Unit: multipart handler, limit enforcement, message payload schema.
  • E2E: browser test for drag/drop, paste, attach; verify message persistence and media retrieval.

Open Questions

  • Do we want per-file captions? For MVP we can skip.
  • Serving media: public path vs. signed URL behind auth — follow current Gateway approach.

Task Breakdown

  • Backend: media upload endpoint + storage
  • Backend: extend message API to accept media[]
  • Frontend: composer (attach/drag/paste) + queue + progress
  • Frontend: message renderer for media
  • Config + docs
  • Tests