From 53e93a193220b0ce578b094d1dc9c433c734ef83 Mon Sep 17 00:00:00 2001 From: RogerHsu7 Date: Fri, 30 Jan 2026 10:36:51 +0800 Subject: [PATCH] Create webchat-file-upload.md --- docs/proposals/webchat-file-upload.md | 89 +++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 docs/proposals/webchat-file-upload.md diff --git a/docs/proposals/webchat-file-upload.md b/docs/proposals/webchat-file-upload.md new file mode 100644 index 000000000..4e53ebe46 --- /dev/null +++ b/docs/proposals/webchat-file-upload.md @@ -0,0 +1,89 @@ +# Gateway Web Chat: Image/File Upload + +Status: draft +Owner: @assistant +Scope: Gateway web console (Chat) + +## Problem +The Gateway web Chat has no way to send images/files. Users often need to share screenshots/logs during troubleshooting. + +## Goals +- Allow users to attach files via: attach button, drag & drop, paste-from-clipboard (images). +- Show selected files as removable chips with upload progress. +- On send, upload files to server and insert a message with media references so assistants receive MEDIA:/path. +- Support multiple files per message. +- Reasonable defaults for limits (20–50MB per item) and allow config override. + +## Non-goals +- Inline buttons/interactive UI in messages (not supported on webchat). +- Server-side virus scanning. + +## UX +- Add "Attach" icon next to the composer. +- Dragging files over the composer shows a drop overlay. +- Pasting an image inserts it as a pending attachment. +- Each selected file shows: name/size, thumbnail for images, remove (×), progress bar while uploading. +- Pressing Enter (without Shift) sends text + queued attachments. + +## API +- POST /api/media/upload (multipart/form-data) + - fields: file (repeatable), caption (optional, per-file or per-message caption TBD) + - returns 200 JSON: { files: [ { path: "MEDIA:/path", mime: "image/png", name: "...", size: 12345, width, height } ] } + - saves files under gateway media path (e.g., ~/.clawdbot/media/inbound/.) +- POST /api/sessions/:sessionId/messages + - body: { text?: string, media?: [ { path: "MEDIA:/...", caption?: string } ] } + - server stores transcript and delivers to assistant runtime. + +Notes: +- If a single endpoint already exists to send messages, extend it to accept media[]. Otherwise, use two-phase (upload -> send). +- Respect existing auth/session; CSRF as per current gateway conventions. + +## Backend +- Add multipart handling (e.g., busboy/multer/fastify-multipart depending on stack). +- Validate content type and size; enforce configurable limits. +- Compute a safe filename; store to media dir; produce absolute server path and MEDIA:/ prefix for assistant. +- For images: attempt to read dimensions (optional) to improve thumbnails. +- Return JSON list of stored files. +- Extend message creation to accept media[]. Persist media metadata alongside messages for rendering. + +## Frontend +- Composer component changes: + - Add hidden bound to Attach button. + - Drag & drop overlay and handlers (dragenter/dragover/drop) on composer area. + - Clipboard paste handler: capture image blobs and add to attachment queue. + - Attachment queue state: { id, file|blob, name, size, previewURL, status: queued|uploading|done|error, serverPath? }[] + - On send: for any queued files not uploaded, call /api/media/upload, show per-file progress. Then call send-message with text + media paths. + - After success: clear input and queue. +- Message renderer: if message.media exists, render thumbnails for images and file tiles for others; clicking downloads the file. + +## Config +- gateway.webchat.uploads.maxItemMB (default 25) +- gateway.webchat.uploads.maxFilesPerMessage (default 5) +- gateway.webchat.uploads.accept (default images+generic: "image/*,application/pdf,.zip,.txt") + +## Security +- Store under per-instance media directory; do not serve arbitrary FS paths. +- Serve media via controlled route with auth (or signed URLs) to avoid leaking private files. +- Sanitize filenames; block executables by default if desired. + +## Acceptance Criteria +- Drag a screenshot into composer -> shows chip -> send -> message appears with thumbnail; assistant receives MEDIA:/path. +- Click attach to select multiple files -> progress -> all appear in transcript. +- Paste an image from clipboard -> becomes attachment and can be sent. +- Oversize file is rejected with a clear error. + +## Test Plan +- Unit: multipart handler, limit enforcement, message payload schema. +- E2E: browser test for drag/drop, paste, attach; verify message persistence and media retrieval. + +## Open Questions +- Do we want per-file captions? For MVP we can skip. +- Serving media: public path vs. signed URL behind auth — follow current Gateway approach. + +## Task Breakdown +- [ ] Backend: media upload endpoint + storage +- [ ] Backend: extend message API to accept media[] +- [ ] Frontend: composer (attach/drag/paste) + queue + progress +- [ ] Frontend: message renderer for media +- [ ] Config + docs +- [ ] Tests