Multimodal Embeddings

Multimodal embedding models produce vector representations from images and text together, enabling cross-modal retrieval where you can search images with text queries or vice versa.

Available Models

LLaVA-Next 13B Embeddings – 4096-dim joint image-text embeddings (13B parameters)
Qwen-VL-2 Embedding – 3584-dim multilingual multimodal embeddings with strong OCR understanding (32+ languages)

Multimodal Embeddings

Available Models

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Multimodal Embeddings

Available Models

On this page

Command Palette