Documentation

LLaVA-Next 13B Embeddings

Joint image-text embeddings from LLaVA-Next for multimodal retrieval

LLaVA-Next generates joint embeddings from images and optional text prompts. The 13B parameter model produces 4096-dimensional vectors combining visual and language understanding for retrieval and similarity search.

When to use:

  • Cross-modal retrieval (search images using text and vice versa)
  • Building multimodal search indexes
  • Semantic image similarity with text context

Input:

  • Image (required): Image to encode
  • Text (optional): Text prompt to pair with the image for joint embedding

Output: 4096-dimensional joint embedding vector

Inference Settings

No inference-time settings. Embeddings are computed deterministically.

On this page


Command Palette

Search for a command to run...

Keyboard Shortcuts
CTRL + KSearch
CTRL + DTheme switch
CTRL + LLanguage switch

Software details
Compiled 3 days ago
Release: v4.0.0-production
Buildnumber: master@994bcfd
History: 46 Items