4.3 KiB
4.3 KiB
Gateway Web Chat: Image/File Upload
Status: draft Owner: @assistant Scope: Gateway web console (Chat)
Problem
The Gateway web Chat has no way to send images/files. Users often need to share screenshots/logs during troubleshooting.
Goals
- Allow users to attach files via: attach button, drag & drop, paste-from-clipboard (images).
- Show selected files as removable chips with upload progress.
- On send, upload files to server and insert a message with media references so assistants receive MEDIA:/path.
- Support multiple files per message.
- Reasonable defaults for limits (20–50MB per item) and allow config override.
Non-goals
- Inline buttons/interactive UI in messages (not supported on webchat).
- Server-side virus scanning.
UX
- Add "Attach" icon next to the composer.
- Dragging files over the composer shows a drop overlay.
- Pasting an image inserts it as a pending attachment.
- Each selected file shows: name/size, thumbnail for images, remove (×), progress bar while uploading.
- Pressing Enter (without Shift) sends text + queued attachments.
API
- POST /api/media/upload (multipart/form-data)
- fields: file (repeatable), caption (optional, per-file or per-message caption TBD)
- returns 200 JSON: { files: [ { path: "MEDIA:/path", mime: "image/png", name: "...", size: 12345, width, height } ] }
- saves files under gateway media path (e.g., ~/.clawdbot/media/inbound/.)
- POST /api/sessions/:sessionId/messages
- body: { text?: string, media?: [ { path: "MEDIA:/...", caption?: string } ] }
- server stores transcript and delivers to assistant runtime.
Notes:
- If a single endpoint already exists to send messages, extend it to accept media[]. Otherwise, use two-phase (upload -> send).
- Respect existing auth/session; CSRF as per current gateway conventions.
Backend
- Add multipart handling (e.g., busboy/multer/fastify-multipart depending on stack).
- Validate content type and size; enforce configurable limits.
- Compute a safe filename; store to media dir; produce absolute server path and MEDIA:/ prefix for assistant.
- For images: attempt to read dimensions (optional) to improve thumbnails.
- Return JSON list of stored files.
- Extend message creation to accept media[]. Persist media metadata alongside messages for rendering.
Frontend
- Composer component changes:
- Add hidden bound to Attach button.
- Drag & drop overlay and handlers (dragenter/dragover/drop) on composer area.
- Clipboard paste handler: capture image blobs and add to attachment queue.
- Attachment queue state: { id, file|blob, name, size, previewURL, status: queued|uploading|done|error, serverPath? }[]
- On send: for any queued files not uploaded, call /api/media/upload, show per-file progress. Then call send-message with text + media paths.
- After success: clear input and queue.
- Message renderer: if message.media exists, render thumbnails for images and file tiles for others; clicking downloads the file.
Config
- gateway.webchat.uploads.maxItemMB (default 25)
- gateway.webchat.uploads.maxFilesPerMessage (default 5)
- gateway.webchat.uploads.accept (default images+generic: "image/*,application/pdf,.zip,.txt")
Security
- Store under per-instance media directory; do not serve arbitrary FS paths.
- Serve media via controlled route with auth (or signed URLs) to avoid leaking private files.
- Sanitize filenames; block executables by default if desired.
Acceptance Criteria
- Drag a screenshot into composer -> shows chip -> send -> message appears with thumbnail; assistant receives MEDIA:/path.
- Click attach to select multiple files -> progress -> all appear in transcript.
- Paste an image from clipboard -> becomes attachment and can be sent.
- Oversize file is rejected with a clear error.
Test Plan
- Unit: multipart handler, limit enforcement, message payload schema.
- E2E: browser test for drag/drop, paste, attach; verify message persistence and media retrieval.
Open Questions
- Do we want per-file captions? For MVP we can skip.
- Serving media: public path vs. signed URL behind auth — follow current Gateway approach.
Task Breakdown
- Backend: media upload endpoint + storage
- Backend: extend message API to accept media[]
- Frontend: composer (attach/drag/paste) + queue + progress
- Frontend: message renderer for media
- Config + docs
- Tests