fix(media): skip audio files in extractFileBlocks text extraction
Audio files (especially OGG/Opus from Telegram voice messages) were being misidentified as text by looksLikeUtf8Text() because OGG headers contain >85% printable ASCII. This caused guessDelimitedMime() to classify them as text/tab-separated-values, injecting raw binary into the model context. Add audio to the skip list alongside image and video in extractFileBlocks() so audio attachments are routed to the transcription pipeline instead of being treated as text files. Fixes #1989
This commit is contained in:
parent
4583f88626
commit
67bbacb3b7
@ -216,7 +216,7 @@ async function extractFileBlocks(params: {
|
||||
}
|
||||
const forcedTextMime = resolveTextMimeFromName(attachment.path ?? attachment.url ?? "");
|
||||
const kind = forcedTextMime ? "document" : resolveAttachmentKind(attachment);
|
||||
if (!forcedTextMime && (kind === "image" || kind === "video")) {
|
||||
if (!forcedTextMime && (kind === "image" || kind === "audio" || kind === "video")) {
|
||||
continue;
|
||||
}
|
||||
if (!limits.allowUrl && attachment.url && !attachment.path) {
|
||||
|
||||
Loading…
Reference in New Issue
Block a user