Documentation

Multimodal Inference

Models that combine image, text, and document understanding

Multimodal models process and relate multiple input types — images, text, and documents.

  • Vision Language – Answer questions about images and extract information from documents
  • Embeddings – Joint image-text embeddings for cross-modal retrieval
  • Reranking – Score image-text or image-image similarity for retrieval
  • Classification – Classify inputs combining multiple image modalities

Command Palette

Search for a command to run...

Keyboard Shortcuts
CTRL + KSearch
CTRL + DTheme switch
CTRL + LLanguage switch

Software details
Compiled 4 days ago
Release: v4.0.0-production
Buildnumber: master@994bcfd
History: 46 Items