Dokumentation (english)

LLaVA-Next 13B Embeddings

Joint image-text embeddings from LLaVA-Next for multimodal retrieval

LLaVA-Next generates joint embeddings from images and optional text prompts. The 13B parameter model produces 4096-dimensional vectors combining visual and language understanding for retrieval and similarity search.

When to use:

  • Cross-modal retrieval (search images using text and vice versa)
  • Building multimodal search indexes
  • Semantic image similarity with text context

Input:

  • Image (required): Image to encode
  • Text (optional): Text prompt to pair with the image for joint embedding

Output: 4096-dimensional joint embedding vector

Inference Settings

No inference-time settings. Embeddings are computed deterministically.

On this page


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 4 Stunden
Release: v4.0.0-production
Buildnummer: master@afa25ab
Historie: 72 Items