Dokumentation (english)

Multimodal Inference

Models that combine image, text, and document understanding

Multimodal models process and relate multiple input types — images, text, and documents.

  • Vision Language – Answer questions about images and extract information from documents
  • Embeddings – Joint image-text embeddings for cross-modal retrieval
  • Reranking – Score image-text or image-image similarity for retrieval
  • Classification – Classify inputs combining multiple image modalities

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 4 Stunden
Release: v4.0.0-production
Buildnummer: master@afa25ab
Historie: 72 Items