Strip raw HTML from email bodies in the Gmail hook pipeline so that
clean plain text reaches the agent session instead of bloated HTML
with CSS, tracking pixels, footers, and boilerplate.
What it does:
- Strips HTML tags, <style>/<script> blocks, and HTML comments
- Removes tracking pixels (1x1 images, display:none)
- Removes base64 data URIs and inline encoded images
- Removes common email footer patterns (unsubscribe, sent from iPhone,
confidentiality notices, copyright, privacy policy, etc.)
- Decodes HTML entities to plain text
- Converts block-level tags and <br> to newlines
- Collapses excessive whitespace (max 2 consecutive newlines)
Config: hooks.gmail.sanitizeBody (boolean, default: true)
Set to false to get the raw HTML body as before.
Sanitisation runs before template rendering in the hook mapping
pipeline, so tokens are saved for all downstream consumers.